<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Empirical Assessment and Characterization of Homophily in Classes of Hate Speeches</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Seema Nagar</string-name>
          <email>seema.nagar@iiitg.ac.in</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sameer Gupta</string-name>
          <email>sameer.lego@gmail.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>C.S. Bahushruth</string-name>
          <email>bahushruth.bahushruth@gmail.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ferdous Ahmed Barbhuiya</string-name>
          <email>ferdous@iiitg.ac.in</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kuntal Dey</string-name>
          <email>kuntal.dey@accenture.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Accenture Technology Labs</institution>
          ,
          <addr-line>Bangalore</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Indian Institute of Information Technology</institution>
          ,
          <addr-line>Guwahati</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Manipal University</institution>
          ,
          <addr-line>Jaipur</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>National Institute of Technology</institution>
          ,
          <addr-line>Kurukshetra</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we investigate homophily in hate speech generation on social media platforms. Homophily plays a significant role in information diffusion, sustenance of online guilds, contagion in product adoption, the emergence of topics, and life-cycle on social networks. In the real world, features to utilize in similarity computation are well defined but not on social media platforms. We note that similarity among the users can be defined along with multiple aspects like: profile metadata, the content generated and style of writing. These derived features are capable of capturing similarity along multiple dimensions, primarily semantic, lexical, syntactical, stylometric and topical. We leverage the important features for authorship attribution, word embeddings, latent and empath topics to compute lexical, syntactical, stylometric, semantic and topical features. We empirically demonstrate the presence of homophily on a dataset from Twitter along with the different aspects of similarity. Further, we investigate how homophily varies with different hateful types such as hate manifesting in topics of gender, race, ethnicity, politics and nationalism. Our results indicate higher homophily in users associating with topics of racism and nationalism.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Social media platforms such as Twitter have enabled
content generation by people in unprecedented ways, not
imagined before. These platforms are now widely used to
generate hate speech. Often times, the hate speech campaigns
have incited real-life violence amongst people
        <xref ref-type="bibr" rid="ref17">(Ribeiro et al.
2017; Mathew et al. 2018)</xref>
        . There are many cases where
countries have blamed social media platforms for inciting
crimes in society. Facebook has been blamed for instigating
anti-Muslim mob violence in Sri Lanka as well as for
playing a leading role in the possible genocide of the Rohingya
community in Myanmar. Therefore, studying multiple social
aspects of hate speech such as diffusion, dissemination, and
consumption is a critical problem.
      </p>
      <p>
        <xref ref-type="bibr" rid="ref14">(McPherson, Smith-Lovin, and Cook 2001)</xref>
        proposed
homophily on social networks, using the assortative mixing
hypothesis. Homophily on social networks is defined as
”similarity breeds familiarity”. It plays significant role in
information diffusion on social networks
        <xref ref-type="bibr" rid="ref12 ref19 ref3 ref6">(Aral, Muchnik, and
Sundararajan 2009; De Choudhury et al. 2010; Halberstam
and Knight 2016; Starbird and Palen 2012)</xref>
        . The importance
of homophily in information diffusion, motivated us to
assess the presence of it in hate speech generation empirically.
Works such as
        <xref ref-type="bibr" rid="ref17">(Ribeiro et al. 2017; Mathew et al. 2019)</xref>
        study the positional aspect of hateful users in the social
network. However, the literature has not explored homophily,
a crucial aspect. We further investigate the strength of
homophilic phenomenon for different types of hate such as,
hate against gender, race, politics and ethnicity.
      </p>
      <p>Two predominant factors are needed to assess homophily,
familiarity, and similarity, which are naturally present on
social media platforms. Familiarity captures the phenomenon
of users becoming friends of (or, following) other users. The
similarity is the phenomenon where a given user is similar
to another user in the context of a given objective, such as
generating hateful content or participating in the same topic.
While familiarity on Twitter can be inferred using
followerfollowee or retweet social networks, similarity computation
is not straight-forward. We believe the similarity between a
pair of users should take multiple aspects of the content
generated in addition to meta-data for the profiles. The multiple
aspects, we explore in this paper are semantic, syntactic,
stylometric and topical. We empirically investigate homophily
along with the multiple aspects in hate speech generation.</p>
      <p>
        We use word embeddings to compute semantic features
for a user. The word embeddings are aggregated in a
timedecaying manner to get a complete semantic representation
of the user-generated content. We utilize the important
features needed in authorship attribution
        <xref ref-type="bibr" rid="ref4">(Bhargava,
Mehndiratta, and Asawa 2013)</xref>
        and some other features designed
by us to derive syntactical and stylometric features.
Additionally, we also include readability (Kincaid 1975) related
features. Lastly, we unearth the hidden thematic structure of
a document along topics and categories in two ways, a)
using latent topic modelling to construct a topic affinity vector
and b) categories using Empath
        <xref ref-type="bibr" rid="ref11 ref12">(Fast, Chen, and Bernstein
2016)</xref>
        to construct category score vector.
      </p>
      <p>Hate speech constituents multiple types of hate. For
example, hate against race, religion, ethnicity, gender, among
others. We believe that the different type of hate strength of
homophily varies across different types of hate speech. We
investigate the question, ”Does the strength of homophily
varies across the different types of hate?”. We propose to
use latent topic modelling to detect the types of hate present
in a corpus.</p>
      <p>In summary, we make the following contributions:
• We explore a slew of similarity features to capture the
multiple aspects of user-generated content
• We experimentally investigate the usefulness of the
various features, using homophily as the benchmark of
comparison
• We do an in-depth analysis of variations in homophily
strength across the different types of hate</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Many works have attempted to understand homophily on
Twitter.
        <xref ref-type="bibr" rid="ref14">(McPherson, Smith-Lovin, and Cook 2001)</xref>
        were
the first to propose homophily in social networks.
Subsequently,
        <xref ref-type="bibr" rid="ref6">(De Choudhury et al. 2010)</xref>
        study the role of
homophily in the diffusion of information on social networks.
They build on the observation that homophily structures the
ego-networks of individuals and impacts their
communication behavior.
      </p>
      <p>
        <xref ref-type="bibr" rid="ref12">(Halberstam and Knight 2016)</xref>
        investigate the role of
homophily in political information diffusion.
        <xref ref-type="bibr" rid="ref3">(Aral, Muchnik,
and Sundararajan 2009)</xref>
        show that homophily is also an
important factor to explain contagion in product adoption on
dynamic networks.
        <xref ref-type="bibr" rid="ref9">(Ducheneaut et al. 2007)</xref>
        demonstrate
that in the online gaming world, the sustenance of a
gaming guild is driven by homophily. Thus, homophily has been
very well studied in the literature and is very important to
explain many social phenomena happening in the virtual
world.
      </p>
      <p>
        Many papers have jointly utilized similarity and
familiarity for modeling solutions on Twitter and other online social
networks.
        <xref ref-type="bibr" rid="ref1">(Afrasiabi Rad and Benyoucef 2014)</xref>
        study
communities formed over friendships on the Youtube social
network. They observe that communities are formed from
similar users on Youtube; however, they do not find large
similarity values between friends in YouTube communities.
Recently, topical homophily is proposed by
        <xref ref-type="bibr" rid="ref7">(Dey et al. 2018)</xref>
        ,
where they show the homophily is the driving factor in the
emergence of topics and their life cycle. However, the
existing literature does not at all address homophily in hate
speech.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Central Idea</title>
      <p>The proposed methodology has three main parts, a)
features for similarity calculation, b) validating these features
in homophily in hate speech and c) discovering types of
hate in a corpus using latent topic modelling techniques. We
argue that similarity calculation on social media platforms
should capture multiple aspects of a user, instead just using
direct textual similarity. These aspects are: user profile
information, writing-related nuances such as stylometry, the
content-generated itself and topics discussed, among others.
Algorithm 1 Computing Semantic Features for a User</p>
      <sec id="sec-3-1">
        <title>Input: User u, Set of Posts P</title>
        <p>timestamps
Output: Semantic
user
= fp1; p2; :::; pM g with</p>
      </sec>
      <sec id="sec-3-2">
        <title>Embedding</title>
        <p>S(u)
of
the
1: Compute time span of P as T
2: Divide T into time-windows T =
ftn; tn 1; tn 2; :::; t1g of size one week where n
are the total number of weeks and t1 is the most recent
week
3: for each time window t in T do
4: Compute weight(tk) = 1=k
5: end for
6: for each post p in P do
7: for each word w in p do
8: Compute word embedding E(w) using Glove
9: Compute tweet embedding E(p) as mean of word
embeddings E(w)
10: end for
11: Find weight W (p) for p using weight for the time
windows it falls in
12: end for
13: Compute user semantic embedding S(u)
14: S(u) = Pi E(pi) W (p)=jP j
We show that homophily exists in hate speech generation on
a dataset from Twitter along all the aspects utilized for
similarity computation. Finally, we propose a topic modelling
based approach to detect the different types of hate present
in hate speech.</p>
        <sec id="sec-3-2-1">
          <title>Features for Similarity Computation</title>
          <p>
            We propose various features to capture the nuances of the
content generated by a user on online social media
platforms. The features are capable of capturing similarity along
semantic, syntactic, stylometric, and topical dimensions.
Semantic Features We use word embeddings to represent
user-generated content in a vector form. We get embedding
for each post made by a user by taking the mean of the word
embeddings and then aggregate the posts embeddings to get
semantic embedding of the user. We use weighted mean
pooling for performing aggregation. Aggregation
methodology is motivated by
            <xref ref-type="bibr" rid="ref16">(Rajadesingan, Zafarani, and Liu 2015)</xref>
            ,
where the authors introduce the importance of time-decay in
the content produced by a user. Time-decaying aggregation
captures two crucial factors, a) Some users are more active
than others and b) recent tweets are more important compare
to the older ones. We split the tweets into time buckets of
size one week. We assign a weight to the tweets in a bucket
inversely proportional to its position in time. Formally, the
time-decay based aggregation is described in1.
          </p>
          <p>
            Syntactic Features We use important features proposed in
            <xref ref-type="bibr" rid="ref16">(Rajadesingan, Zafarani, and Liu 2015)</xref>
            and some features
designed by us to compute a syntactical feature vector for a
user. These features include: number of capital words,
question marks, exclamations, numbers, URLs, user mentions,
hashtags, emojis, present in a tweet and then averaged over
all the tweets posted by a user.
          </p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Stylometric and Readability Features</title>
          <p>
            We use important features for authorship attribution from
            <xref ref-type="bibr" rid="ref4">(Bhargava, Mehndiratta, and Asawa 2013)</xref>
            and some
features from
            <xref ref-type="bibr" rid="ref16">(Rajadesingan, Zafarani, and Liu 2015)</xref>
            .
Authorship attribution aims to detect the author of a piece of
content produced, motivated us to use these features to
capture the style of a user. These features include number of
words per tweet, number of sentences per tweet, number of
elongated words per tweet (e.g. hiiii), number of repeated
words per tweet, word length distribution (vector of length
19 which has the frequency of words for that particular
length), mean, median, the standard deviation of this
distribution
            <xref ref-type="bibr" rid="ref16">(Rajadesingan, Zafarani, and Liu 2015)</xref>
            .
          </p>
          <p>Additionally, we also compute the readability score for
each user by using Flesch-Kincaid Reading Ease formula
(Kincaid 1975). We create a document d for each user u by
combining all the tweets as shown in Equation 1 and then
perform readability computation.</p>
          <p>
            Topical Features We compute topic features in two ways,
a)perform topic modelling on the user-generated content.
We employ latent topic modelling techniques, as described
in the next section. We use the latent affinity to topics to
construct a topic vector for each user and b) using the
methodology proposed in
            <xref ref-type="bibr" rid="ref11 ref12">(Fast, Chen, and Bernstein 2016)</xref>
            , construct
empath category scores vector.
          </p>
        </sec>
        <sec id="sec-3-2-3">
          <title>Hateful Forms Detection using Topic Modelling</title>
          <p>
            We use a latent topic detection technique called LDA
            <xref ref-type="bibr" rid="ref5">(Blei,
Ng, and Jordan 2003)</xref>
            to detect the latent topics present in
a tweet. Due to tweets being short in length and large in
number, scaling of LDA to detect topics where every tweet
is treated as one document is very challenging. Therefore,
we create one document per user by concatenating all his
posts, which includes tweets, retweets, and quotes. Let a use
ui have made posts P, where P = p1; p2; :::; pN . Then, the
document di for user ui is created by concatenating all the
tweets in one document. Therefore, we have:
(1)
(2)
di = [(8pj2P )pj
Let D = (8i 2 1::n)di be the corpus of documents. We
further investigate D to detect latent topics present in it
using LDA based techniques. We explore two variants of
sampling for LDA, a) variational Bayes sampling method and
b) Gibbs Sampling. Let the set of latent topics is T, where
T = t1; t2; :::; tn. The latent topic modelling produces a
vector of topic affinity scores vdi for each document di. Let ai
is the affinity scores with respect to topic ti, then we have
topic affinity vector as follows:
          </p>
          <p>Tdi =&lt; a1; a2; a3; ::::; aT &gt;</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <sec id="sec-4-1">
        <title>Experiments Overview</title>
        <p>The purpose of the experiments is to investigate the
following research questions:
• RQ1: Is homophily exhibited by the users generating
hateful content and does it vary across the different types of
similarity aspects?
• RQ2: Is homophily pronounced for particular hateful
forms?</p>
      </sec>
      <sec id="sec-4-2">
        <title>Experiment Settings</title>
      </sec>
      <sec id="sec-4-3">
        <title>Experimental dataset</title>
        <p>
          We use the dataset provided by
          <xref ref-type="bibr" rid="ref17">(Ribeiro et al. 2017)</xref>
          . This
dataset contains 200 most recent tweets of 100; 386 users,
totaling to 19M tweets. It also contains a retweet induced
graph of the users. It has 2; 286; 592 directed edges. The
dataset does not have labels for the tweet content.
Therefore we manually annotate the tweets as hateful or not.
Annotating 19M tweets of all the users is a costly and
timeconsuming process. Therefore, we pick only a sub-set of
the users whose tweets we manually annotate. We run
modularity optimization-based community detection using
networkx1 to pick a sub-set of users on the retweet network.
The two communities picked have an equal number of edges
around 1; 60; 000 while the users are 7; 679 and 3; 277
respectively. These two communities have a sufficient number
of users (from the perspective of the number of tweets to
label) and edge density varies significantly between the two.
        </p>
      </sec>
      <sec id="sec-4-4">
        <title>Parameter setting</title>
        <p>
          We use existing familiarity metric of an edge existing or not,
between a pair of users. We compute similarity in six
different ways. Semantic features are constructed using glove
word embeddings2 while syntactic and stylometric are
extracted based on
          <xref ref-type="bibr" rid="ref16 ref4">(Bhargava, Mehndiratta, and Asawa 2013;
Rajadesingan, Zafarani, and Liu 2015)</xref>
          . We construct
latent topical features by running Latent Dirichlet Allocation
(LDA) using MALLET3 on tweet corpus. The tweet corpus
consists of tweets documents for all the users, wherein a
tweets document is created for each user by concatenating
all her posts. We set = 5:0 and = 0:01. We use the
library empath-client4 to compute category score vector for
each user using the tweets document.
        </p>
        <p>
          We again use LDA to detect hateful topics. In this case,
we only pick hateful tweets to create a tweets document for
a user. We perform grid search where we vary alpha between
0:1 and 0:01 and the number of topics from 6 to 12. The
number of iterations for each run is 500. We look at the
coherence scores
          <xref ref-type="bibr" rid="ref18">(Ro¨der, Both, and Hinneburg 2015)</xref>
          and
visualization of topics in terms of overlap using pyLDAvis5. We
find that the best performing topic model, which has both a
high coherence score and the least number of overlapping
topics, is when = 0:1 and number of topics equal to 8. We
observe that = 0:01 gives us a higher coherence score for
the same number of topics as compared to = 0:1.
        </p>
      </sec>
      <sec id="sec-4-5">
        <title>Experiments Results</title>
        <p>To answer RQ1, we plot similarity, computed as cosine
similarity, against familiarity for the six types of similarity
metrics in Figures 1 and 2 for the community 1 and community
2 respectively. We vary the hatefulness of the users on the
xaxis, where hatefulness of a user is defined as the percentage
of hateful tweets. We see that as the hatefulness of the users</p>
        <sec id="sec-4-5-1">
          <title>1https://networkx.org/</title>
          <p>2https://github.com/stanfordnlp/GloVe
3http://mallet.cs.umass.edu/
4https://github.com/Ejhfast/empath-client
5https://pypi.org/project/pyLDAvis/
increases, similarity values for all types of features also
increases. We further observe that this pattern is enhanced in
topic-based similarity. This is hinting that latent topics are
capable of capturing the higher level semantics of the
discussion happening on the social media platforms.</p>
          <p>To answer RQ2, we create a user base for each hate
type(topic). We pick users whose affinity score is above a
certain threshold. We also rank the different hashtags used
by users by frequency. This is shown in Table 1. We
observe that many users show higher values of association with
specific topics (0; 5; 7), as compared to the rest. Therefore,
we decide to have a dynamic threshold for topic affinity. To
compute these affinity thresholds, we select users in such a
way that there are a reasonable number (at least 10% of
total users) of representative users. For each topic, we plot the
average familiarity, and average similarity in 3. The
similarity and familiarity values are normalized by dividing by
the maximum values, respectively. We observe that topics 3
and 7 exhibit stronger homophily, as compared to others for
both the communities. These topics can be broadly
categorized into hate manifesting nationalism and racism.
In this paper, we demonstrate homophily in hate speech on
social media platforms. We also show that certain hate types
exhibit stronger homophily in comparison to others. Unlike
in the real world, features to compute similarity on social
media platforms is not straightforward to define. Therefore,
we propose a slew of features to capture similarity along
with multiple aspects present on social media platforms. We
demonstrate homophily in hate speech generation along with
all these aspects. Further, we observe the variation of
homophily in different classes of hate. We find that racism,
and xenophobia (nationalism) shows stronger evidence of
homophily among users.</p>
          <p>Kincaid, J. 1975. Derivation of New Readability Formulas:
(automated Readability Index, Fog Count and Flesch
Reading Ease Formula) for Navy Enlisted Personnel. Research
Branch report. Chief of Naval Technical Training, Naval Air
Station Memphis.</p>
          <p>Mathew, B.; Dutt, R.; Goyal, P.; and Mukherjee, A. 2019.
Spread of hate speech in online social media. In Proceedings
of the 10th ACM Conference on Web Science, 173–182.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Afrasiabi</given-names>
            <surname>Rad</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          ; and Benyoucef,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <year>2014</year>
          .
          <article-title>Similarity and ties in social networks a study of the youtube social network</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>Journal of Information Systems Applied Research</source>
          <volume>7</volume>
          (
          <issue>4</issue>
          ):
          <fpage>14</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Aral</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Muchnik</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Sundararajan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2009</year>
          .
          <article-title>Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks</article-title>
          .
          <source>Proceedings of the National Academy of Sciences</source>
          <volume>106</volume>
          (
          <issue>51</issue>
          ):
          <fpage>21544</fpage>
          -
          <lpage>21549</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Bhargava</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Mehndiratta</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Asawa</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>Stylometric Analysis for Authorship Attribution on Twitter</article-title>
          .
          <source>In Big Data Analytics</source>
          ,
          <fpage>37</fpage>
          -
          <lpage>47</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <year>2003</year>
          .
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>the Journal of machine Learning research</source>
          <volume>3</volume>
          :
          <fpage>993</fpage>
          -
          <lpage>1022</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>De Choudhury</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Sundaram</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ; John, A.;
          <string-name>
            <surname>Seligmann</surname>
          </string-name>
          , D. D.; and
          <string-name>
            <surname>Kelliher</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <year>2010</year>
          . ”
          <article-title>Birds of a Feather”: Does User Homophily Impact Information Diffusion in Social Media? arXiv preprint</article-title>
          arXiv:
          <volume>1006</volume>
          .
          <fpage>1702</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Dey</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Shrivastava</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Kaushik,
          <string-name>
            <given-names>S.</given-names>
            ; and
            <surname>Garg</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Assessing</given-names>
            <surname>Topical</surname>
          </string-name>
          <article-title>Homophily on Twitter</article-title>
          .
          <source>In International Conference on Complex Networks and their Applications</source>
          ,
          <volume>367</volume>
          -
          <fpage>376</fpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Ducheneaut</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Yee</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Nickell</surname>
          </string-name>
          , E.; and
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>R. J.</given-names>
          </string-name>
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <article-title>The life and death of online gaming communities: a look at guilds in world of warcraft</article-title>
          .
          <source>In Proceedings of the SIGCHI conference on Human factors in computing systems</source>
          ,
          <volume>839</volume>
          -
          <fpage>848</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Fast</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ; and Bernstein,
          <string-name>
            <surname>M. S.</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>Empath: Understanding topic signals in large-scale text</article-title>
          .
          <source>In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems</source>
          ,
          <volume>4647</volume>
          -
          <fpage>4657</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Halberstam</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Knight</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <year>2016</year>
          . Homophily, group size, and
          <article-title>the diffusion of political information in social networks: Evidence from Twitter</article-title>
          .
          <source>Journal of public economics</source>
          <volume>143</volume>
          :
          <fpage>73</fpage>
          -
          <lpage>88</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          2018.
          <article-title>Analyzing the hate and counter speech accounts on Twitter</article-title>
          . arXiv preprint arXiv:
          <year>1812</year>
          .02712 .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>McPherson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Smith-Lovin</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Cook</surname>
            ,
            <given-names>J. M.</given-names>
          </string-name>
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <article-title>Birds of a feather: Homophily in social networks</article-title>
          .
          <source>Annual review of sociology 27</source>
          <volume>(1)</volume>
          :
          <fpage>415</fpage>
          -
          <lpage>444</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Rajadesingan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Zafarani</surname>
            , R.; and Liu,
            <given-names>H.</given-names>
          </string-name>
          <year>2015</year>
          .
          <article-title>Sarcasm detection on twitter: A behavioral modeling approach</article-title>
          .
          <source>In WSDM</source>
          ,
          <fpage>97</fpage>
          -
          <lpage>106</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Ribeiro</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Calais</surname>
          </string-name>
          , P.; dos Santos, Y.;
          <string-name>
            <surname>Almeida</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ; and
          <string-name>
            <given-names>Meira</given-names>
            <surname>Jr</surname>
          </string-name>
          ,
          <string-name>
            <surname>W.</surname>
          </string-name>
          <year>2017</year>
          . ”
          <article-title>Like Sheep Among Wolves”: Characterizing Hateful Users on Twitter</article-title>
          . In MIS2 Workshop at WSDM'
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          Ro¨der,
          <string-name>
            <given-names>M.</given-names>
            ;
            <surname>Both</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ; and
            <surname>Hinneburg</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          <year>2015</year>
          .
          <article-title>Exploring the space of topic coherence measures</article-title>
          .
          <source>In Proceedings of the eighth ACM international conference on Web search and data mining</source>
          ,
          <fpage>399</fpage>
          -
          <lpage>408</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Starbird</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ; and
          <string-name>
            <surname>Palen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <year>2012</year>
          .
          <article-title>(How) will the revolution be retweeted? Information diffusion and the 2011 Egyptian uprising</article-title>
          .
          <source>In Proceedings of the acm 2012 conference on computer supported cooperative work</source>
          ,
          <fpage>7</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>