<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Measuring citizen participation in South African public debates using Twitter: An exploratory study</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Selvas Mwanza</string-name>
          <email>smwanza@cs.uct.ac.za</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hussein Suleman</string-name>
          <email>hussein@cs.uct.ac.za</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Cape Town</institution>
          ,
          <addr-line>Cape Town</addr-line>
          ,
          <country country="ZA">South Africa</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>ICT4D Research Centre, University of Cape Town</institution>
          ,
          <addr-line>Cape Town</addr-line>
          ,
          <country country="ZA">South Africa</country>
        </aff>
      </contrib-group>
      <fpage>26</fpage>
      <lpage>34</lpage>
      <abstract>
        <p>This paper addresses the task of measuring Twitter social attributes that can be used for detecting patterns that show user participation in public debates in South African. We propose a method that leverages observable information on Twitter such as use of language, retweeting user behaviour, and the relationship between topics and the user social network graph. Our experimental results suggest high degrees of citizen participation: people in an otherwise multilingual country tweet in a dominant language; there is more original commentary and interactive discussion; and topics often span natural online communities.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1 Introduction
With the large user base and the ease of publishing
content, Twitter has become an ideal platform for
many people to communicate and also serves as a
platform for expressing opinions on different
topics like politics, sports and socio-economic issues.
Users on Twitter can converse and interact in
different ways. A user can follower another user. A
user who follow another user subscribes to receive
Twitter messages posted by the followed. Users
can reference each other in messages using the
@ symbol followed by the username (e.g., I miss
@cindy my best friend). Users can also forward
a message to others. Twitter adds the key word
RT @username at the beginning of all forwarded
tweets. The username after the @ symbol is the
name of the user who originally posted the
message. In addition, Twitter users can use a # symbol
to indicate what the message is about.</p>
      <p>
        In 2015, University students in South Africa
protested against the increase in school fees
        <xref ref-type="bibr" rid="ref12">(Griffin, 2015)</xref>
        . This was mirrored on Twitter when
the #FeesMustFall hash tag created for the protests
trended on Twitter worldwide. This provides
evidence of the adoption of Twitter by citizens in
South Africa as a platform to participate in
socioeconomic issues.
      </p>
      <p>
        Social media mining is the process of
representing, analysing, and extracting actionable patterns
from social media data
        <xref ref-type="bibr" rid="ref36">(Zafarani et al., 2014)</xref>
        .
Twitter data has been mined by different
researchers around the world. Examples of
Twitter mining includes: financial prediction
        <xref ref-type="bibr" rid="ref22">(Mao et
al., 2012)</xref>
        , extracting market and business insights
        <xref ref-type="bibr" rid="ref22 ref30">(Park and Chung, 2012)</xref>
        , political analysis
        <xref ref-type="bibr" rid="ref27">(Monti
et al., 2013)</xref>
        , mass movement analysis
(BorgeHolthoefer et al., 2015) and monitoring of
natural disastere and crises
        <xref ref-type="bibr" rid="ref32">(Takeshi et al., 2010)</xref>
        .
Although a lot of research has been done, little
attention has been given to Twitter data produced in
Africa.
      </p>
      <p>In this paper, we address the task of measuring
citizen participation in public debates on
Twitter. We use standard methods like language
detection in text, graph partitioning and graph
centrality measures to detect patterns of use of
language, retweeting user behaviour, and the
relationship between topics and user communities to
measure user participation in public debates in South
African.</p>
      <p>The paper is organized as follows. Section 2
introduces the literature review on social media
analysis. Section 3 describes in detail our methodology
for measuring citizen participation in South Africa
using Twitter data, while Section 4 reports on the
experiment design and the results. Finally, in
Section 5 we discuss the conclusions and outline
future work.</p>
    </sec>
    <sec id="sec-2">
      <title>Literature Review</title>
      <p>This section looks at previous work that is related
to our work.
2.1</p>
      <sec id="sec-2-1">
        <title>Graph partitioning</title>
        <p>
          Graph partitioning or community detection aims
to identify groups in a graph by only using the
information encoded in the graph topology
          <xref ref-type="bibr" rid="ref17">(Lancichinetti and Fortunato, 2009)</xref>
          . Lancichinetti and
Fortunato (2009) reviewed various disjoint
community detection algorithms. Disjoint community
detection algorithms partition a graph into
disjoint groups and has a wide application. Recently,
with the introduction of social media mining,
attention has been given to overlapping community
detection algorithms. Overlapping community
detection algorithms identify a set of partitions that
are not necessarily disjoint
          <xref ref-type="bibr" rid="ref34">(Xie et al., 2013)</xref>
          . A
node in the graph can be found in more than one
partition. People in social media usually have
connections to several social groups like family,
friends, and colleagues. Java (2007) used an
overlapping detection algorithm called clique
percolation method (CPM) to detect overlapping
communities in a Twitter network. CMP was used to find
how communities connect to each other by
overlapped components. Overlapping community
detection has also been used to explain how
information cascades through Twitter communities
          <xref ref-type="bibr" rid="ref2">(Barbieri et al., 2013)</xref>
          . The authors used a community
detection algorithm to find the level of authority
and passive interest of a node in each community
it belongs to.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Graph centrality measures</title>
        <p>
          Node centrality measures node involvement in
the walk structure of a network
          <xref ref-type="bibr" rid="ref7">(Freeman,
1978)</xref>
          . Freeman defined three centrality measures,
namely, degree, closeness and betweenness.
Degree centrality is a count of the number of edges
incident upon a given node. Closeness defines
the total geodesic (the length of a walk is
defined as the number of edges it contains, and
the shortest path between two nodes is known
as a geodesic) distance from a given node to all
other nodes. Betweenness measures the geodesics
that pass through a given vertex. Centrality
measures have been used in ranking and understanding
nodes in social networks. Ediger (2010) used
betweenness centrality to rank nodes in clusters of
conversations on Twitter data. Betweenness
centrality score has also been used to detect spammers
in Twitter
          <xref ref-type="bibr" rid="ref35">(Yang et al., 2011)</xref>
          . The authors used
the betweenness centrality to rank users in a graph
then use the ranking score to identify spammers.
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Use of language detection in Twitter</title>
        <p>
          Language detection is the task of detecting the
natural language in which a document is written
          <xref ref-type="bibr" rid="ref20">(Lui et al., 2004)</xref>
          . Hong and Convertino
          <xref ref-type="bibr" rid="ref13 ref24 ref33 ref35">(Hong
and Convertino, 2011)</xref>
          used language detection
in Twitter data to discover cross-language
differences in adoption of features such as URLs,
hashtags, mentions, replies, and retweets. The authors
used a combination of LingPipe text classifier and
Google language API to to classify 62,556,331
tweets into languages. The data was downloaded
for a period of four weeks. The authors then
analyzed how each cluster uses URLs, hashtags,
mentions, replies, and retweets. Use of language has
also been used as a primary tool for detecting spam
in tweets
          <xref ref-type="bibr" rid="ref10 ref2 ref23 ref34 ref4">(Martinez-Romo and Araujo, 2013)</xref>
          . The
authors examine the use of language in the topic,
a tweet, and the page linked from the tweet. They
make an assumption that the language model for
a spam tweet will be substantially different: the
spammer is usually trying to divert traffic to sites
that have no semantic relation. They exploit this
divergence between the language models to
effectively classify tweets as spam or non-spam.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <p>In this section we describe in details three types of
social attributes that can help in measuring citizen
participation: use of language, retweeting user
behaviour, and relationship between topics and
the user network graph. We use these three
metrics to detect patterns that measure citizen
participation in public debates in South Africa.
3.1</p>
      <sec id="sec-3-1">
        <title>Use of Language</title>
        <p>South Africa is a multilingual country with nine
official languages, namely: English, Afrikaans,
Zulu, Xhosa, Ndebele, Northern Sotho, Tsonga,
Tswana and Venda. English and Afrikaans are
high resource languages while the other
languages, which are Bantu languages, are low
resource languages. In our work, we are interested in
detecting English and Afrikaans in tweets. Tweets
that cannot be detected as English or Afrikaans are
categorized as other.</p>
        <p>
          Tweets are informal. They contain special tokens
such as @ for usernames, # for trending topics
and they have http links for related content. They
also contain slang, misspellings and grammatical
errors. We implemented a program called
SATwitterCleaner that cleans the dataset before language
detection. Cleaning involved doing the following:
1. Removing usernames: The program removes
all usernames in the dataset by searching for
words that starts with the @ symbol. This
follows the convention that all usernames in
Twitter messages are prefixed with the @
symbol.
2. Removing hash tag (#) symbol in the
messages: The program removes all hash tags by
searching for the # symbol.
3. Removing URLs in the messages: Twitter
users reference external sources by inserting
URLs in their messages. SATwitterCleaner
implements a string pattern that identifies
URLs in Twitter messages and removes them.
4. Remove emoticons from the message: An
emoticon is a representation of a facial
expression used in electronic communication
to convey the writer’s feelings. The online
community uses different types of emoticons
for different expressions. We compiled 15
emoticons used for happy expressions and 11
emoticons used for sad expressions. The
program used this list to indentify and remove
emoticons from messages.
5. Expand slang words into their actual
meaning: Slang is the use of informal words and
expressions that are not considered standard
in the speaker’s language or dialect but are
considered acceptable in certain social
settings. Example: 2b means to be. We created a
slang dictionary of 5,364 slang words. Each
slang word in the dictionary was mapped to
its actual meaning. The slang dictionary was
used by SATwitterCleaner to expand all slang
words found in the dataset.
6. Correcting spelling and grammatical errors
in English tweets: The program employed a
LanguageTool library to correct the grammer
in tweets. LanguageTool (LT) is based on
surface text processing, without deep
parsing, yet, it manages to get significantly
better results for some languages than
commercially available products
          <xref ref-type="bibr" rid="ref25">(Mikowski, 2010)</xref>
          .
Spelling check and correction was done
by using the jazzy spell checker
          <xref ref-type="bibr" rid="ref14">(Idzelis,
2005)</xref>
          . Jazzy spell checker integrates the
DoubleMetaphone phonetic matching
algorithm and the Levenshtein distance using the
near-miss strategy. The jazzy spell checker
was chosen because it gives suggestions if
the word is not properly spelled.
SATwitterCleaner employs the spell checker to pick
the first option in the suggestion list as a
replacement for the mispelled word. The
method used in our work for grammar and
spell checking is limited to English text as
we could not find equivalent libray tools for
Afrikaans. Hence, only English text was
corrected on grammar and spelling.
7. Replacing repeated characters in words with
the correct number of characters: We
developed a method for English text that can
remove repeated characters in words. English
seldom uses words with more than two
character repetition. However, there are words
with three character repetition. We compiled
a list of 21 English words with three character
repetition. The program ignores all the words
with repeated characters that are found in the
compiled list. Otherwise, if a word has
repeated characters, the program first reduces
the repeated characters to two. Then, using
the jazzy spell checker
          <xref ref-type="bibr" rid="ref14">(Idzelis, 2005)</xref>
          , the
program checks if the word is a correct
English word. If not, the spell checker is used to
get the suggested close word. The program
then computes the cosine similarity distance
between the suggested word and the original
word. If the distance is below a threshold,
the suggested word is taken as a replacement,
otherwise the program skips the replacement.
We chose the similarity distance threshold of
1.
        </p>
        <p>
          After data cleaning, we used a combination of the
Naive Bayesian method and simple word
statistics to detect the English and Afrikaans tweets.
We used LangDetect, which implements a Naive
Bayes classifier, using a character n-gram based
representation without feature selection, with a
set of normalization heuristics to improve
accuracy
          <xref ref-type="bibr" rid="ref28">(Nakatani, 2010)</xref>
          . Lui and Baldwin (2014)
compared the performance of eight off-the-shelf
language detection systems to determine which
would be the most suitable for Twitter data. They
compared langid.py
          <xref ref-type="bibr" rid="ref1">(Baldwin et al., 2013)</xref>
          , CLD2
          <xref ref-type="bibr" rid="ref24">(McCandless, 2010)</xref>
          , LangDetect
          <xref ref-type="bibr" rid="ref19 ref36">(Lui and
Baldwin, 2014)</xref>
          , LDIG
          <xref ref-type="bibr" rid="ref29">(Nakatani, 2012)</xref>
          , whatlang
          <xref ref-type="bibr" rid="ref4">(Brown, 2013)</xref>
          , YALI
          <xref ref-type="bibr" rid="ref21">(Majlis, 2012)</xref>
          , TextCat
          <xref ref-type="bibr" rid="ref31">(Scheelen, 2003)</xref>
          and MSR-LID
          <xref ref-type="bibr" rid="ref10">(Goldszmidt et
al., 2013)</xref>
          . They compared the systems on four
different Twitter datasets. They found that LDIG
outperforms all the algorithms though it supports
a limited number of languages and Afrikaans is
not one of them. Overall, they concluded that,
in their off-the-shelf configuration, only three
systems (LangDetect, langid.py, CLD2) perform
consistently well on language detection of Twitter
messages. Our Twitter messages cleaner,
SATwitterCleaner and the language detection program
was developed in Java hence we chose LangDetect
because it has Java support. Simple word statistics
classify tweets by counting the number of words
in a tweet that are English or Afrikaans. If the
number is higher than or equal to 50%, a tweet
is classified as English or Afrikaans respectively.
All the tweets that were not detected as English
by LangDetect were classified by the simple word
statistics. This allowed us to compensate for the
inacuracy of the LangDetect system. Only tweets
with more than three words were considered for
language detection.
3.2
        </p>
        <sec id="sec-3-1-1">
          <title>Retweeting user behaviour</title>
          <p>Twitter adds the key word RT @username to all
forwarded tweets. RT mean retweet and
@username refers to the name of the user who originally
made the tweet.</p>
          <p>In our work, we want to measure how many tweets
and retweets are present in the dataset. To find the
number of original tweets, we counted all tweets
that do not start with RT @. To find the number of
retweets in the dataset, we counted all the tweets
that starts with RT @ keyword.
3.3</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Relationship between topics and user network graph</title>
          <p>
            We created a social graph using retweets. Galuba
(2010) showed that retweets is the most powerful
mechanism to diffuse information and a strong
indication of the direction of information flow in
Twitter. We created a graph using retweets
because we wanted to see and measure how a graph
form around the tweets. Users form vertices in the
graph. We add an edge from user @A to user @B
whenever @A retweets a tweet from @B. We treat
the graph as undirected, so an edge from @A to
@B also connects @B back to @A. All loops are
discarded from the graph. Loops are formed when
a user retweets his/her own tweet. We also ignore
duplicate user interactions so that only unique
user interactions are represented in the graph. Our
graph had 30,114 vertices and 55,578 edges. In
this paper we analyse the graph at two different
levels, network level and group level. Network
level is the view of the entire graph. Group level is
the view of sub-graphs/communities in the graph.
Network level
At a network level we calculated the betweenness
centrality of all the nodes in the graph. Freeman
(1978) defines betweenness centrality as: let
gij denote the number of geodesic paths from
node i to node j, and let gikj denote the number
of geodesic paths from i to j that pass through
intermediary k. Then the betweenness centrality
is defined as follows:
Betweenness centrality measures the
influence/centrality of a node in a graph. According
to the definition, a node with high betweenness
centrality sits at a connection point of subgraphs.
A node plays a major role in the movement
of the data from one subgraph to the other.
Freeman applied the betweenness to connected
and undirected graphs. Social networks often
share common characteristics. Natural clusters
form, but the clusters do not partition the graph
            <xref ref-type="bibr" rid="ref26">(Mislove et al., 2007)</xref>
            . We use this characteristic
to make an assumption that our graph will be
largely connected and hence the betweenness can
be applied.
          </p>
          <p>
            We also performed another measurement at the
network level we called resourceful measure.
We calculated how many tweets from each node
in the graph have been retweeted at least once.
A node with a high resourceful measure has a
high number of tweets retweeted at least once by
other users. Resourceful measure measures how
many tweets each node has contributed in the
graph. In our work, we compared the resourceful
measure with the betweenness centrality measure
of nodes to find the relationship between the top
producers of tweets in the graph and the top users
who propagate tweets to subgraphs. The Jaccard
similarity coefficient
            <xref ref-type="bibr" rid="ref15">(Jaccard, 1902)</xref>
            is a common
index for binary variables. It is defined as the
quotient between the intersection and the union
of the pairwise compared variables among two
objects. Jaccard is calculated as follows: given
two groups b and c, the percent similarity = [a/(a
+ b + c)] where a = number of elements present
in both groups, b = number of elements present
only in group b, and c = number of elements
present only in the group c. Jaccard coefficient is
a number from 0 and 1. If the coefficient is 0, it
means the two groups are completely unidentical.
If the coefficient is 1, then the two groups are
completely identical. We used the Jaccard
coefficient to measure the similarity between the nodes
with high betweenness centrality and the nodes
with high resourceful measure.
          </p>
          <p>
            Group level
We partitioned the graph into communities. Xie,
Kelley and Szymanski (2013) did a review of
the state of the art in overlapping community
detection algorithms. They reviewed a total of
fourteen algorithms and concluded that, for low
overlapping density networks, SLPA
            <xref ref-type="bibr" rid="ref33">(Xie et al.,
2011)</xref>
            , OSLOM
            <xref ref-type="bibr" rid="ref18">(Lancichinetti et al., 2011)</xref>
            , Game
            <xref ref-type="bibr" rid="ref5">(Chena et al., 2010)</xref>
            and COPRA
            <xref ref-type="bibr" rid="ref11">(Gregory, 2010)</xref>
            offer better performance than the other tested
algorithms. For networks with high overlapping
density and high overlapping diversity, both SLPA
and Game provide relatively stable performance.
We evaluated two algorithms, namely, COPRA
and SLPA. We observed that SLPA performed
better than COPRA on our graph both in computer
time and modularity. The modularity of a partition
is a scalar value between -1 and 1 that measures
the density of links inside communities as
compared to links between communities
            <xref ref-type="bibr" rid="ref9">(Girvan
and Newman, 2002)</xref>
            . After evaluation, we used
the SLPA overlapping algorithm for community
detection in the graph.
          </p>
          <p>Topic Categories
Controversial topics
Developmental topics
Entertainment topics
Political topics
Road accident topics
National Event topics
Other</p>
          <p>Results and Discussion
This section discusses our experimental set up and
results.
To do this experiment, trending topics in South
Africa shown in Table 1 were used to download
tweets from 4th June 2016 to 19th June 2016.
Twitter implements a proprietary algorithm that
shows the trending topics in Twitter data.
Trending topics can either be hash tagged words or non
hash tagged words. We manually observed
trending topics in South Africa from the Twitter
website for 16 days and used the Web API to
download 131,790 tweets from 37,876 Twitter accounts.
The topics were categorized into seven (7) groups,
namely: controversial topics, developmental
topics, entertainment topics, political topics, road
accident topics, national events topics and other.
We first start with the results of the language
detection. Our experiments show 94.64% of
tweets were in English, 2.61% of tweets were
in Afrikaans and 2.75% was detected as other.
Other means the tweet was neither English nor
Afrikaans. During the experiment, we noticed that
tweets were repeating in the dataset. This is
because users can retweet the same tweet, causing
repetition. So, before detecting the language, we
filtered out all the repeating tweets. After filtering,
the number of tweets in the dataset was reduced to
66,378. The result show that despite having many
languages, South Africa tweets in a common
language. This pattern suggests that people tweet so
that their message can be read across a larger
spectrum of the population.</p>
          <p>The next result describes the tweet-retweet
behaviour. The downloaded dataset had 58.88 %
tweets and 41.12 % retweets. This pattern
suggests that there is more original contribution in
public debates.</p>
          <p>
            The last set of results show the analysis of the
social graph. Our results shows that 79.5% of users
in our dataset participate in conversation. To
measure participation in conversation, we counted all
the users in our dataset who retweeted other user’s
tweets or their tweets were retweeted by others.
We used the Jaccard coefficient to measure the
similarity of users with high betweenness
centrality and users with high resourceful measure. Users
with high betweenness centrality play a major role
in the movement of tweets in the graph. Users with
high resourceful measure have a high number of
tweets retweeted at least once by other users. We
took the top 50 users with the highest
resourceful measure and top 50 users with the highest
betweenness centrality and computed the Jaccard
coefficient. The coefficient is the number between 0
and 1. A coefficient of 0 means the two groups
are completely unidentical. If the coefficient is 1,
then the two groups are completely identical. Our
calculation yielded a coefficient of 0.23. This
result concludes that, top users who provide
information in the graph are not the top users who
propagate the tweets through communities. Finally, we
compared topics in the communities to find
overlaps. SLPA
            <xref ref-type="bibr" rid="ref33">(Xie et al., 2011)</xref>
            was used to
partition the graph into communities. SLPA is a
nondetermistic algorithm, so we ran the algorithm 11
times and recorded the average performance. The
algorithm produced 2,200 communities with an
overlap of 7.3%. This shows that our graph had
a low overlapping density. Table 2 shows that
all communities tweeted about Oscar Pistorius and
there is not a clear cut division among
communities with regards to topics. Though communities
focus on certain topics - group 5 and 10 talk more
about political topics, group 3 entertainment
topics, all communities talk about other issues too.
These graph patterns suggests that citizens
participate in public debates on a variety of topics.
5
          </p>
          <p>Conclusions and Future Work
We presented social attributes that help identify
patterns that measure citizen participation in
public debates in South Africa. Africa is highly
multilingual, hence we chose the use of language as
an attribute that can indicate participation in
online public discussions. We also considered user
retweeting behavior and how topics relate to
online communities. This exploratory study provides
the first step in Twitter analysis on South African
online data. This paper considers only a snapshot
of the South African Twitter data. In future, we
aim to consider the temporal aspects of the graph.
2
3
4
5
6
7
8
9
10
11
12
1704
1330
333
300
288
273
147
126
124
114
102</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Timothy</given-names>
            <surname>Baldwin</surname>
          </string-name>
          , Paul Cook, Marco Lui, Andrew MacKinlay, and
          <string-name>
            <given-names>Li</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>How noisy social media text,how different social media sources</article-title>
          ?
          <source>In Proceedings of the 6th International Joint Conference on Natural Language Processing.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Nicola</given-names>
            <surname>Barbieri</surname>
          </string-name>
          , Francesco Bonchi, and
          <string-name>
            <given-names>Giuseppe</given-names>
            <surname>Manco</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Cascade-based community detection</article-title>
          .
          <source>Sixth ACM international conference on Web search and data mining</source>
          , pages
          <fpage>33</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Javier</given-names>
            <surname>Borge-Holthoefer</surname>
          </string-name>
          , Walid Magdy, Kareem Darwish, and
          <string-name>
            <given-names>Ingmar</given-names>
            <surname>Weber</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Content and network dynamics behind egyptian political polarization on twitter</article-title>
          .
          <source>ACM Conference on Computer Supported Cooperative Work and Social Computing</source>
          , (
          <volume>18</volume>
          ):
          <fpage>700</fpage>
          -
          <lpage>711</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Ralf</given-names>
            <surname>Brown</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Selecting and weighting n-grams to identify 1100 languages. 16th international conference on text, speech and dialogue</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Duanbing</given-names>
            <surname>Chena</surname>
          </string-name>
          , Mingsheng Shanga, Zehua Lvb, and
          <string-name>
            <given-names>Yan</given-names>
            <surname>Fua</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Detecting overlapping communities of weighted networks via a local algorithm</article-title>
          .
          <source>Physica A: Statistical Mechanics and its Applications</source>
          , pages
          <fpage>4177</fpage>
          -
          <lpage>4187</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>David</given-names>
            <surname>Ediger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Karl</given-names>
            <surname>Jiang</surname>
          </string-name>
          , Jason Riedy,
          <string-name>
            <given-names>David A.</given-names>
            <surname>Bader</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Courtney</given-names>
            <surname>Corley</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Massive social network analysis: Mining twitter for social good</article-title>
          .
          <source>39th International Conference on Parallel Processing</source>
          , pages
          <fpage>583</fpage>
          -
          <lpage>593</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Linton C.</given-names>
            <surname>Freeman</surname>
          </string-name>
          .
          <year>1978</year>
          .
          <article-title>Centrality in social networks conceptual clarification</article-title>
          .
          <source>Social Networks</source>
          , pages
          <fpage>215</fpage>
          -
          <lpage>239</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Wojciech</given-names>
            <surname>Galuba</surname>
          </string-name>
          , Karl Aberer, Dipanjan Chakraborty, Zoran Despotovic, and
          <string-name>
            <given-names>Wolfgang</given-names>
            <surname>Kellerer</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Outtweeting the twitterers - predicting information cascades in microblogs</article-title>
          .
          <source>WOSN'10 Proceedings of the 3rd Wonference on Online social networks</source>
          , pages
          <fpage>3</fpage>
          -
          <lpage>3</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Michelle</given-names>
            <surname>Girvan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Mark E. J.</given-names>
            <surname>Newman</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Community structure in social and biological networks</article-title>
          .
          <source>Proceedings of the National Academy of Sciences 99</source>
          , pages
          <fpage>7821</fpage>
          -
          <lpage>7826</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Moises</given-names>
            <surname>Goldszmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Marc</given-names>
            <surname>Najork</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Stelios</given-names>
            <surname>Paparizos</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Boot-strapping language identifiers for short colloquial postings</article-title>
          .
          <source>European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases.</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Steve</given-names>
            <surname>Gregory</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Finding overlapping communities in networks by label propagation</article-title>
          .
          <source>arXiv:0910.5516 [physics.soc-ph].</source>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Nosakhere</given-names>
            <surname>Griffin</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Fees must fall and the possibility of a new african university</article-title>
          . www.
          <source>Face2FaceAfrica</source>
          .com.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Lichan</given-names>
            <surname>Hong</surname>
          </string-name>
          and
          <string-name>
            <given-names>Gregorio</given-names>
            <surname>Convertino</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Language matters in twitter: A large scale study</article-title>
          .
          <source>Fifth International AAAI Conference on Weblogs and Social Media.</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Mindaugas</given-names>
            <surname>Idzelis</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Jazzy: The java open source spell checker</article-title>
          . http://jazzy.sourceforge.net/.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Paul</given-names>
            <surname>Jaccard</surname>
          </string-name>
          .
          <year>1902</year>
          . Lois de distribution florale.
          <source>Bulletin de la Socet Vaudoise des Sciences Naturelles</source>
          , pages
          <fpage>67</fpage>
          -
          <lpage>130</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Akshay</given-names>
            <surname>Java</surname>
          </string-name>
          , Xiaodan Song,
          <string-name>
            <surname>Tim Finin</surname>
            , and
            <given-names>Belle</given-names>
          </string-name>
          <string-name>
            <surname>Tseng</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Why we twitter: understanding microblogging usage and communities</article-title>
          .
          <source>9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis</source>
          , pages
          <fpage>56</fpage>
          -
          <lpage>65</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Lancichinetti</surname>
          </string-name>
          and
          <string-name>
            <given-names>Santo</given-names>
            <surname>Fortunato</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Community detection algorithms: a comparative analysis</article-title>
          .
          <source>arXiv:0908</source>
          .
          <fpage>1062</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Lancichinetti</surname>
          </string-name>
          , Filippo Radicchi, Jose, Javier Ramasco, and
          <string-name>
            <given-names>Santo</given-names>
            <surname>Fortunato</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Finding statistically significant communities in networks</article-title>
          .
          <source>PLoS One</source>
          ,
          <volume>6</volume>
          (
          <issue>4</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Marco</given-names>
            <surname>Lui</surname>
          </string-name>
          and
          <string-name>
            <given-names>Timothy</given-names>
            <surname>Baldwin</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Accurate language identification of twitter messages</article-title>
          .
          <source>NICTA VRL.</source>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Marco</given-names>
            <surname>Lui</surname>
          </string-name>
          , Jey Han Lau, and
          <string-name>
            <given-names>Timothy</given-names>
            <surname>Baldwin</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Automatic detection and language identification of multilingual documents</article-title>
          .
          <source>Proceedings of the 2004 ACM Symposium on Applied. Computing.</source>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Martin</given-names>
            <surname>Majlis</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Yet another language identifie</article-title>
          .
          <source>Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics</source>
          , pages
          <fpage>46</fpage>
          -
          <lpage>54</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Yuexin</given-names>
            <surname>Mao</surname>
          </string-name>
          , Wei Wei, and
          <string-name>
            <given-names>Bing</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Correlating s&amp;p 500 stocks with twitter data</article-title>
          .
          <source>ACM International Workshop on Hot Topics on Interdisciplinary Social Networks Research</source>
          , (
          <volume>1</volume>
          ):
          <fpage>69</fpage>
          -
          <lpage>72</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>Juan</given-names>
            <surname>Martinez-Romo</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lourdes</given-names>
            <surname>Araujo</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Detecting malicious tweets in trending topics using a statistical analysis of language</article-title>
          .
          <source>Expert Systems with Applications: An International Journal</source>
          ,
          <volume>40</volume>
          :
          <fpage>2992</fpage>
          -
          <lpage>3000</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>Michael</given-names>
            <surname>McCandless</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>ccuracy and performance of googles compact language detector</article-title>
          . http://blog.mikemccandless.com/
          <year>2011</year>
          /10/accuracyand-performance
          <article-title>-of-googles</article-title>
          .html.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>Marcin</given-names>
            <surname>Mikowski</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Developing an open source, rule based proofreading tool</article-title>
          .
          <source>Software: Practice and Experience</source>
          ,
          <volume>40</volume>
          :
          <fpage>543</fpage>
          -
          <lpage>566</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <given-names>Alan</given-names>
            <surname>Mislove</surname>
          </string-name>
          , Massimiliano Marcon,
          <string-name>
            <given-names>Krishna P.</given-names>
            <surname>Gummadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Druschel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Bobby</given-names>
            <surname>Bhattacharjee</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Measurement and analysis of online social networks</article-title>
          .
          <source>Proceedings of the 7th ACM SIGCOMM conference on Internet measurement</source>
          , pages
          <fpage>29</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <given-names>Corrado</given-names>
            <surname>Monti</surname>
          </string-name>
          , Alessandro Rozza, Giovanni Zappella, Matteo Zignani, Adam Arvidsson, and
          <string-name>
            <given-names>Elanor</given-names>
            <surname>Colleoni</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Modelling political disaffection from twitter data</article-title>
          .
          <source>International Workshop on Issues of Sentiment Discovery and Opinion Mining</source>
          , (
          <volume>2</volume>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <given-names>Shuyo</given-names>
            <surname>Nakatani</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Language detection library (slides)</article-title>
          . http://www.slideshare.net/shuyo/languagedetection
          <article-title>-library-for-java.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <given-names>Shuyo</given-names>
            <surname>Nakatani</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Short text language detection with infinity-gram</article-title>
          . http://shuyo.wordpress.com/
          <year>2012</year>
          /05/17/shorttext-language
          <article-title>-detection-with-infinity-gram/.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <given-names>Jaimie Y.</given-names>
            <surname>Park</surname>
          </string-name>
          and
          <string-name>
            <surname>Chin-Wan Chung</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>When daily deal services meet twitter: understanding twitter as a daily deal marketing platform</article-title>
          .
          <source>Annual ACM Web Science Conference</source>
          , (
          <volume>4</volume>
          ):
          <fpage>233</fpage>
          -
          <lpage>242</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <given-names>Frank</given-names>
            <surname>Scheelen</surname>
          </string-name>
          .
          <year>2003</year>
          . libtextcat. http://software.wiseguys.nl/libtextcat/.
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          <string-name>
            <given-names>Sakaki</given-names>
            <surname>Takeshi</surname>
          </string-name>
          , Okazaki Makoto, and
          <string-name>
            <given-names>Matsuo</given-names>
            <surname>Yutaka</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Earthquakeshakes twitter users: Real-time event detection by social sensors</article-title>
          .
          <source>International conference on World wide web,</source>
          (
          <volume>19</volume>
          ):
          <fpage>851</fpage>
          -
          <lpage>860</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <given-names>Jierui</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <surname>Boleslaw K. Szymanski</surname>
          </string-name>
          , and Xiaoming Liu.
          <year>2011</year>
          .
          <article-title>Slpa: Uncovering overlapping communities in social networks via a speaker-listener interaction dynamic process</article-title>
          .
          <source>IEEE ICDM workshop on DMCCI.</source>
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          <string-name>
            <given-names>Jierui</given-names>
            <surname>Xie</surname>
          </string-name>
          , Stephen Kelley, and
          <string-name>
            <surname>Boleslaw</surname>
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Szymanski</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Overlapping community detection in networks: The state-of-the-art and comparative study</article-title>
          .
          <source>ACM Computing Surveys (CSUR)</source>
          ,
          <volume>45</volume>
          (
          <issue>43</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          <string-name>
            <given-names>Chao</given-names>
            <surname>Yang</surname>
          </string-name>
          , Robert Chandler Harkreader, and
          <string-name>
            <given-names>Guofei</given-names>
            <surname>Gu</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Die free or live hard? empirical evaluation and new design for fighting evolving twitter spammers</article-title>
          .
          <source>14th international conference on Recent Advances in Intrusion Detection</source>
          ,
          <volume>6961</volume>
          :
          <fpage>318</fpage>
          -
          <lpage>337</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          <string-name>
            <given-names>Reza</given-names>
            <surname>Zafarani</surname>
          </string-name>
          , Mohammad Ali Abbasi, and Huan Liu.
          <year>2014</year>
          .
          <article-title>Social Media Mining</article-title>
          . Cambridge University Press, NY, USA.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>