=Paper=
{{Paper
|id=Vol-1893/Paper5
|storemode=property
|title=A Distributed Approach for the Analysis of Discussions in Twitter
|pdfUrl=https://ceur-ws.org/Vol-1893/Paper5.pdf
|volume=Vol-1893
|authors=Teresa	Alsinet,Josep Argelich,Ramón Béjar,Jordi Planes,Joel Cemeli,Cristian Sanahuja
|dblpUrl=https://dblp.org/rec/conf/ijcai/AlsinetABPCS17
}}
==A Distributed Approach for the Analysis of Discussions in Twitter==
<pdf width="1500px">https://ceur-ws.org/Vol-1893/Paper5.pdf</pdf>
<pre>
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


                    A Distributed Approach for the Analysis of
                              Discussions in Twitter

                Teresa Alsinet1 , Josep Argelich1 , Ramón Béjar1 , Jordi Planes1 , Joel Cemeli2 ,
                                           and Cristian Sanahuja2
                                1
                                 INSPIRES Research Center – University of Lleida
                                       Jaume II, 69 – 25001 Lleida, Spain
                            {tracy, jargelich, ramon, jplanes}@diei.udl.cat ? ??
                                    2
                                      Polytechnic School, University of Lleida
                                       Jaume II, 69 – 25001 Lleida, Spain
                              jcs13@alumnes.udl.cat, csanahuja10@gmail.com ? ? ?


                      Abstract. In a recent work we have developed an argumentative ap-
                      proach for discovering relevant opinions in Twitter. A Twitter discussion
                      is modeled as a weighted argument graph where each node denotes a
                      tweet, each edge denotes a criticism relationship between a pair of tweets
                      of the discussion and each node is attached with a weight that denotes
                      the social relevance of the corresponding tweet in the discussion. In the
                      social network Twitter, a tweet always refers to previous tweets in the
                      discussion, so the obtained underlying argument graph is acyclic. Based
                      on this structural feature, in this work, we introduce a distributed algo-
                      rithm for computing the set of globally accepted opinions of a Twitter
                      discussion. The set of accepted opinions is extracted by mapping the
                      weighted argument graph into a valued argumentation framework and it
                      is computed as the biggest set of tweets of the discussion that satisfies
                      that it is consistent.


               1    Motivation and Antecedents

               In order to understand what are the major accepted and rejected opinions in
               different domains by Twitter users, in a recent work [1] we have developed a
               system for analysis of discussions in Twitter.
                   The system architecture has two main components: a discussion retrieval and
               a reasoning system. The discussion retrieval component allows us to move from
                ?
                 Copyright c 2017 for the individual papers by the papers’ authors. Copying per-
                 mitted for private and academic purposes. This volume is published and copyrighted
                 by its editors.
              ??
                 This work was partially funded by the Spanish MICINN Projects TIN2014-53234-
                 C2-2-R and TIN2015-71799-C2-2-P.
             ???
                 The fifth author and sixth author acknowledge the finantial support of a student
                 grant from the MECD/AGAUR agencies of the Spanish Government and the finan-
                 tial support of a student grant from the Student Vice-Rectorate of the University of
                 Lleida, respectively.


                                                         45
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


               a discussion in Twitter (a set of tweets) in natural language to a weighted graph
               which is computed taking into account criticism relationships between tweets
               and three different attributes of a tweet: the number of followers of the author,
               the number of retweets and the number of favorites. The reasoning system com-
               ponent maps the weighted graph into a weighted argumentation framework and
               the set of socially accepted tweets in the discussion is evaluated from the weight
               assigned to each tweet and the criticism relationships between the tweets of the
               discussion, and it is computed as the ideal semantics [8] of a valued abstract
               argumentation framework [2].
                   In abstract argumentation [7], a graph is used to represent a set of argu-
               ments and counterarguments. Each node is an argument and each edge denotes
               an attack between arguments. Several different kinds of semantics for abstract
               argumentation frameworks have been proposed that highlight different aspects
               of argumentation (for reviews see [3,4,15]). Usually, semantics are given to ab-
               stract argumentation frameworks in terms of extensions. For a specific extension
               an argument is either accepted, rejected, or undecided and, usually, there is a
               set of extensions that is consistent with the semantic context.
                   The system developed in [1] builds a weighted argument graph for a Twitter
               discussion, where each node denotes a tweet, each edge denotes a criticism re-
               lationship between a pair of tweets of the discussion and each node is attached
               with a weight that denotes the social relevance of the corresponding tweet in the
               discussion and it is computed from some tweet’s attributes. In the social network
               Twitter, a tweet always answers or refers to previous tweets in the discussion,
               so the obtained underlying argument graph is acyclic.
                   Based on the fact that the graphs we obtain are acyclic, in this work, we
               introduce and investigate a distributed implementation of the skeptical approach
               based on the ideal semantics of a valued abstract argumentation framework. A
               tweet defeats a second one if it criticizes the second one and it has at least the
               same social relevance as the second one. The ideal semantics for valued abstract
               argumentation guarantees that the set of tweets in the solution is the maximal
               set of tweets that satisfies that it is consistent, in the sense that there are no
               defeaters among them. But the solution also satisfies that it defends against
               defeaters from the tweets outside the solution, in the sense we will see in the
               formal definition of ideal semantics.
                   The social relevance is measured with a social valuation function that for
               each tweet considers different information sources from the social network, such
               as the number of followers of the author, the number of retweets and the number
               of favorites. The distributed approach can be of special relevance for assessing
               Twitter discussions that involve a large number of tweets and the system can
               be applied in fields where identifying groups of tweets globally compatible or
               consistent, but at the same time that are widely accepted, is of particular interest,
               such as for instance for the assistance and guidance of marketing and policy
               makers.
                   After this introduction, in the next section, we formalize the structure to
               model Twitter discussions and, in Section 3, we define the reasoning model for


                                                        46
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


               computing their solutions. Then, in Section 4, we present a novel distributed
               strategy for the implementation of the reasoning model based on the ideal se-
               mantics for a valued abstract argumentation framework. We end the paper with
               some conclusions and a discussion of future work. As far as we know, this is the
               first distributed algorithm presented to solve such problem.


               2    A weighted graph for Twitter discussions
               Following the approach proposed in [1], in this section, we introduce a computa-
               tion structure (a weighted graph) to represent a Twitter discussion considering
               only criticism relationships between pairs of tweets.

               Definition 1. (Twitter Discussion) A Twitter discussion Γ is a non-empty set
               of tweets. A tweet t ∈ Γ is a tuple t = (m, a, fl, r, fv), where m is the up to 140
               characters long message of the tweet, a is the author’s identifier of the tweet,
               r ∈ N is the number of retweets and fv ∈ N is the number of favorites. Let
               t1 = (m1 , a1 , fl1 , r1 , fv1 ) and t2 = (m2 , a2 , fl2 , r2 , fv2 ) be a pair of tweets of a
               Twitter discussion Γ . We say that t1 answers t2 iff t1 is a reply to the tweet t2
               or t1 mentions (refers to) tweet t2 .

               Definition 2. (Discussion Graph) The Discussion Graph (DisG) for a Twitter
               discussion Γ is the directed graph (T, E) such that
                 – for every tweet in Γ there is a node in T and
                 – if tweet t1 = (m1 , a1 , fl1 , r1 , fv1 ) answers tweet t2 = (m2 , a2 , fl2 , r2 , fv2 ), with
                   a1 6= a2 , and m1 criticizes the claim expressed in m2 , there is a directed edge
                   (t1 , t2 ) in E.
               Only the nodes and edges obtained by applying this process belong to T and E,
               respectively.

                   Although our system allows us to analyze any discussion in Twitter (set of
               tweets), in this work we deal with discussions where a tweet answers previous
               tweets in the discussion. Moreover, in our approach, (t1 , t2 ) ∈ E iff t1 answers
               t2 and the message of tweet t1 does not agree with the claim expressed in the
               message of tweet t2 . So, the answers between tweets whose messages are not
               classified as criticisms, do not give rise to edges and, therefore, some nodes
               (tweets) of a discussion graph can be disconnected. Thus, we can find nodes for
               which the input or the output degree, or both, is zero.
                   Since the social network we are considering in this work is Twitter, every
               tweet of a discussion can reply at most one tweet, but can mention many tweets,
               and all of them are prior in the discussion. So, every tweet can answer and, in
               turn, can criticize many prior tweets of the discussion, and thus, every tweet can
               criticize many prior tweets from a same author and from different authors.
                   From an implementation point of view, in order to check if a tweet criticizes
               another tweet, we can use some of the components we have developed in [1]. On
               the one hand, to check if a tweet t1 replies a tweet t2 , the system can use the


                                                              47
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


               data structure in the JSON format provided by the Twitter API. In particular,
               this fact can be easily checked from the attribute in reply to status id of the
               object of t1 , which provides the tweet identifier to which t1 replies. To check
               the set of mentions of t1 , the system searches for all authors mentions in the
               message of t1 . Every mention of an author is stored in the message with a label
               of the form: @<author identifier>. So, t1 mentions t2 whenever the author’s
               identifier of t2 is in the set of mentions of t1 .
                    On the other hand, to check if a tweet does not agree with the claim ex-
               pressed in a different tweet, the system uses an automatic labeling system based
               on Support Vector Machines (SVM). Our SVM model for labeling relations be-
               tween tweets considers different attributes obtained from the tweets of an answer
               (t1 , t2 ). On the one hand, we have attributes that count the number of occur-
               rences of relevant words in the tweets t1 and t2 . We have considered two kinds of
               words: regular words and stopwords. We have considered the inclusion of stop-
               words as attributes because the typical tweet is very short and the fraction of
               stopwords that can be giving information about the kind of answer could be
               relevant. For example, in the next tweet

               @ponpimpampum @LL_Sosa Jajajaja...!!!


               the stopwords ... and !!! give information about the sentiment associated to
               the tweet.
                   On the other hand, we also consider attributes that have to be computed
               from the text and from the additional information that comes with the tweets.
               In particular, for each tweet these attributes are the number of images and the
               number of URLs mentioned in the tweet, the number of positive and negative
               emoticons and the sentiment expressed by the tweet. Our labeling system incor-
               porates a sentiment analysis computation module [12,14] that given the set of
               words in a tweet it provides a sentiment value in the range [−5, +5], where -5
               is the most negative sentiment and +5 is the most positive sentiment. Finally,
               the sentiment value is incremented (or decremented) considering every positive
               (negative) emoticon.
                   Since SVM follows a supervised learning approach, we first have to train
               a model from an already labeled data set of answers. To this aim, we have
               collected a set of several Twitter discussions, on the Spanish language, and we
               have manually labeled the answers in the discussions to be able to train a SVM
               labeling model for Spanish discussions.
                   The training collection contains 12 discussions and a total of 582 pairs of
               tweets (answers). We have considered the creation of SVM models with different
               number of regular words (w) and different number of stopwords (s). The words
               in the collection are sorted by number of occurrences, and for an SVM model
               with w regular words, we select the first w most frequent regular words. The
               stopwords have been obtained from the natural language toolkit (NLTK) [6]
               and have been also ordered by number of occurrences, so we also select the s
               most frequent stopwords.


                                                        48
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


                   Using this training collection, we have trained four models with different
               values for w and s, in order to get a good labeling model. In order to compute
               the sentiment for the tweets in this collection, we have taken the AFINN data3
               used in [12,14] and we have translated the words to Spanish. See [1] for a detailed
               description of the SVM training model that we have implemented for checking
               criticism relationships between tweets from Spanish Twitter discussions.

               Definition 3. (Weighted Discussion Graph) A weighted discussion graph (WDisG)
               for a Twitter discussion Γ is a tuple hT, E, R, W i, where

                 – (T, E) is the DisG graph for Γ ,
                 – R is a nonempty set of ordered values and
                 – W is a weighting scheme W : T → R that assigns a weight value in R to
                   each tweet in T , representing the social relevance of the tweet.

                   Regarding the implementation, we instantiate the set of ordered values R
               to the natural numbers N and, for each node with tweet t = (m, a, fl, r, fv), we
               consider the following weighting scheme:

                                     W (t) = blog2 (fl + 20 ∗ r + 40 ∗ fv + 1)c,

               which takes into account not only the number of followers of the author, but
               also the number of retweets and favorites. This function allows us to quantify
               the orders of magnitude of the social relevance of tweets following the statistics
               about tweets and retweets defined in [5], trying to give each attribute a weight
               proportional to its relevance. From the statistics shown in [5], we observe that
               on weighting with twenty times the value of retweets and forty times the value
               of favorites, the magnitudes of the three attributes are comparable and one
               attribute does not dominate the others, since the number of followers is usually
               much bigger than the number of retweets and favorites. We finally compute the
               log2 function of the combined value, since we want to consider that one tweet is
               more relevant than another only if such combined weight is at least two times
               bigger for the first tweet. We will refer to this weighting scheme as fl1r20fv40.
                   Figure 1 shows a WDisG graph instance for a Twitter discussion obtained
               from the political domain using the fl1r20fv40 weighting scheme. Each tweet
               is represented as a node and each criticism answer as an edge. The root tweet
               of the discussion is labeled with 0 (the tweet that starts the discussion) and the
               other nodes are labeled with consecutive identifiers according to the temporal
               generation order of the tweets in the social network. The nodes that appear
               disconnected in the graph correspond to tweets that have participated in the
               discussion in response to other tweets, but have not been classified as criticisms
               answers (nodes 3 and 5). The discussion has a simple structure, possibly one
               of the most frequent in Twitter. A root tweet starts the discussion and some
               answers criticize it, and there are not many criticisms between non-root tweets.
               The discussion contains 13 tweets and 14 criticisms answers. Nodes are colored
                3
                    http://www2.compute.dtu.dk/~faan/data/AFINN.zip


                                                        49
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


               in blue scale, where the darkness of the color is directly proportional to its weight
               with respect to the maximum value in the discussion.
                   Since our reasoning model defines the set of socially accepted tweets of a
               discussion as a set of tweets without effective conflicts, the solution is mainly
               defined by tweets that generate or receive criticism of other tweets and it is
               presented in next section.


                                          Fig. 1. WDisG graph instance.


               3    An argumentation-based reasoning model

               Once we have introduced our formal representation model of Twitter discussions
               with criticism relationships between pairs of tweets, the next key component is
               the definition of the reasoning model used to obtain the set of socially accepted
               tweets. To this end, we use a valued abstract argumentation framework [2] for
               modeling the weighted argumentation problem associated with a WDisG graph
               and ideal semantics [8] for defining the solution (the set of socially accepted
               tweets).
                   A valued argumentation framework (VAF) is a tuple hA, attacks, R, Val, Valprefi
               where A is a set of arguments, attacks is an irreflexive binary relation on A, R is
               a nonempty set of values, Val is a valuation function Val : A → R that assigns
               to each argument in A a weight value in R, and Valpref ⊆ R × R is a prefer-
               ence relation on R (transitive, irreflexive and asymmetric), reflecting the value
               preferences of arguments.
                   Given a Twitter discussion Γ and its WDisG graph G =hT, E, N, fl1r20fv40i
               based on the weighting scheme fl1r20fv40 : T → N, the VAF for G is the tu-
               ple F = hT, E, N, fl1r20fv40,¿i, where each tweet in T results in an argument
               of F, each criticism answer in E results in an attack between the arguments
               of F, N is the set of valuations or weights of the arguments, the weighting
               scheme fl1r20fv40 is the weighting valuation function of F, and the order
               relation > of N is the preference relation between the valuations or weights


                                                        50
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


               of the arguments; i.e. a tweet t2 is more valued than a tweet t1 whenever
               fl1r20fv40(t2 ) > fl1r20fv40(t1 ).
                    Then, a defeat relation (or effective attack relation) between tweets (ar-
               guments) is defined as follows: defeats = {(t1 , t2 ) ∈ E | fl1r20fv40(t2 ) 6>
               fl1r20fv40(t1 )}.
                    Moreover, a set of tweets (arguments) S ⊆ T is conflict-free if for all t1 , t2 ∈
               S, (t1 , t2 ) 6∈ defeats, and a conflict-free set of tweets S ⊆ T is maximally ad-
               missible if for all t1 6∈ S, S ∪ {t1 } is not conflict-free and, for all t2 ∈ S, if
               (t1 , t2 ) ∈ defeats then there exists t3 ∈ S such that (t3 , t1 ) ∈ defeats. Finally, the
               set of socially accepted tweets of Γ , referred as the solution of Γ , is computed as
               the largest admissible conflict-free set of tweets S ⊆ T in the intersection of all
               maximally admissible conflict-free sets. Remark that the tweets of a discussion
               that do not generate nor receive criticism, are always part of the solution.
                    Figure 2 shows the VAF solution for the WDisG graph instance of Figure 1.
               The nodes colored in blue are the tweets in the solution and the nodes colored
               in gray are the rejected tweets, where the darkness of the color is directly pro-
               portional to its weight. According to the fl1r20fv40 valuation function, the
               tweets of the discussion are stratified in three levels denoting their relevance
               in the discussion. For each level, we find the following sets of tweets: level 0:
               {3, 2, 4, 5, 9, 10, 11}, level 1: {6, 7, 8, 12}, level 2: {0, 1}, being level 0 the lowest
               level and level 2 the highest one. The solution contains 7 tweets (Tweets 1, 4,
               7, 8, 9, 11 and 12) of the 13 tweets of the discussion, and 6 tweets are rejected
               (Tweets 0, 2, 3, 5, 6 and 10). On the one hand, Tweet 1 defeats the root tweet,
               since Tweet 1 attacks the root tweet and both have the same weight. The same
               happens with Tweets 3, 5, 6 and 10, since Tweet 4 attacks Tweet 3, Tweet 11
               attacks Tweets 5 and 10, and Tweets 7 and 12 attack Tweet 6. On the other
               hand, Tweet 8 defeats Tweet 2, since Tweet 8 attacks Tweet 2 and Tweet 8 is
               heavier than Tweet 2. Finally, Tweets 1, 4, 7, 8, 9, 11 and 12 are accepted since
               they do not have any defeater in the solution. Notice that in this discussion with
               high controversy around Tweets 0 and 6 (with a high number of attacks), we
               end up rejecting both tweets due to the weight they get during the discussion.


                                            Fig. 2. WDisG graph solution.


                                                          51
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


                  In [1] we implemented a reasoning system for computing the VAF solution of
               a WDisG graph based on the algorithm for computing the ideal extension for an
               argumentation framework presented in [9], but adapting it to work with valued
               arguments. Regarding the implementation we used an approach based on Answer
               Set Programming (ASP) available in the argumentation system ASPARTIX [10],
               but we extended it to work with VAFs, as the current implementation in AS-
               PARTIX only works with non-valued arguments. To develop such extension we
               modified the manifold ASP program explained in [11] incorporating:
                 – the valuation function for arguments,
                 – the preference relation between argument valuations and
                 – the defeat relation relation between arguments (effective attack).
                   Now in this work, we define a distributed strategy for implementing the
               underlying reasoning algorithm for computing the ideal extension for a VAF.
               The algorithm takes a WDisG graph of a Twitter discussion and outputs the set
               of accepted tweets based on the valuation function, the preference relation and
               the computation of the largest admissible conflict-free set of tweets according to
               the defeat relation.


               4    Distributed solution computation
               We design a distributed strategy for computing the solution of a WDisG graph
               using the distributed model of computation of Pregel [13]. This model is appro-
               priate for our problem, because the input for a Pregel algorithm is a directed
               graph, where the nodes can be in different states, and the goal of a distributed
               algorithm in Pregel is to compute the state of each node based on the state of
               the nodes’ neighbors. Any Pregel algorithm starts initializing each node to some
               initial state. Then, the distributed computation follows a sequence of supersteps
               separated by global synchronization points until the algorithm finishes a point
               where every node is happy with its current state. This computation model is
               actually inspired by Valiant’s Bulk Synchronous Parallel model [16].
                   Within each superstep the nodes compute their state in parallel, executing
               a specific function that computes the new state of the node taking into account
               the possible messages sent by the nodes’ neighbors in the previous superstep.
               Then, as a byproduct of the computation of the node state, some messages may
               be sent to some of the nodes’ neighbors, that they will be processed in the
               next superstep. The idea is that the messages are used by nodes to indicate
               some change in their state, that could have some influence on the state of their
               neighbors in the next superstep. The superstep finishes when every node has
               computed its state and has sent the necessary messages to its neighbors.
                   In the computation model of Pregel, the input can be any directed graph.
               However, the algorithm we present here works only with discussion graphs that
               are acyclic, such that it allows us to solve discussions of big size with an efficient
               polynomial time distributed algorithm, where the state of each node depends
               only on the state of its attacking nodes. Observe that in the case of Twitter,


                                                        52
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


               where a tweet only answers previous tweets, the discussion graphs are always
               acyclic, so restricting the algorithm to consider only acyclic discussion graphs is
               not a real restriction for Twitter.
                   For the distributed computation of the solution (ideal extension) for a dis-
               cussion acyclic graph, we propose a Pregel algorithm where each node can be
               in two states: accepted or not accepted (rejected), but a node also stores a de-
               featers count, to keep track of the number of accepted defeaters the node has, to
               actually compute its acceptance state. Initially, every node starts in the rejected
               state and with its defeaters counter equal to zero.


               Algorithm 1 Compute the acceptance state for a node a
                1: procedure a.CompAcceptance(i) . Update acceptance state of a at superstep i
                2:    a.storeCurrentAcceptanceState()
                3:    for all msg ∈ a.received(i-1) do              . Check defeaters count update
                4:        if (msg.type == attacker) then a.updateDefeaters( msg.value )
                5:    a.updateAcceptanceState()
                6:    if ( a.StateChanged() ) then      . a changed to accepted or to not accepted
                7:        sendToAllNeighbors( msg( attacker, (a.weight(), a.isAccepted() ) ) )
                8:    else
                9:        a.VoteToHalt()                   . Vote to finish distributed computation


                   Algorithm 1 shows the pseudocode of the function used by each node a
               to compute its state in a superstep i. 4 It works as follows. In the superstep
               i, a node a checks if its state has to be changed after updating its defeaters
               counter. The node a updates its defeaters counter based on the received messages
               (from the previous superstep i − 1) from any nodes v connected with a with an
               incoming edge (v, a). These messages are processed in the for loop of lines 3
               and 4. There are two possible changes that every message can produce on the
               defeaters counter of a. On the one hand, if in the previous superstep a node
               v, and such that (v, a) is an edge of the graph, changed its state to accepted
               then v sent a message to a of type attacker and with value (v.weight(), +1),
               indicating to a that its defeaters counter should be increased by one but only
               if v is a defeater of a (if v.weight() ≥ a.weight()). On the other hand, such
               node v could instead have changed its state to not accepted, so in that case the
               message sent to a will be also of type attacker but with value (v.weight(), −1),
               indicating that its defeaters counter should be decreased by one if v is a defeater
               of a. The possible addition or subtraction to its defeaters counter, caused by
               a message msg sent by a node v is checked (and performed when needed) by
               calling the function a.updateDef eaters(msg.value), where the value has the
               format indicated previously. Then, after updating the defeaters counter the state
                4
                    The pseudocode is written using object oriented notation, as the Pregel API is
                    written in C++. However, our actual implementation is based on the Pregel imple-
                    mentation found on the Spark distributed programming framework, graphX, that is
                    written in Scala.


                                                         53
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


               of a is updated calling the function a.updateAcceptanceState(), that checks if
               the updated defeaters counter is greater than zero. Finally, we call the function
               a.StateChanged() (in line 6) to check whether the state of a changed (from not
               accepted to accepted or viceversa). If it changed, a sends an attacker message
               to all its outgoing neighbors with value (a.weight(), a.isAccepted()), where the
               function a.isAccepted() will return either +1 or -1 depending on whether the
               state of a is accepted or not accepted. These sent messages will be processed
               by the outgoing neighbors of a when the next superstep begins. If the state of
               a did not change, then a votes towards finishing the distributed computation
               of the solution (line 9), but only when all the nodes agree to finish, in a same
               superstep, will the algorithm finish.
                   Consider the execution of the algorithm when solving the example discussion
               we have presented in Section 2, and whose solution is shown on Figure 2:

                1. Before the first superstep, all the nodes start in the not accepted state and
                   with defeaters counter equal to zero, so in the first superstep, as there are
                   no incoming messages for any node, all of them will change their state to
                   accepted. Then, any node with outgoing edges (all except 3 and 0), sends a
                   message to its attacked nodes with value (node.weight(), +1). Observe that
                   nodes 4, 7, 8, 9, 11 and 12 will remain accepted for the rest of the execution
                   of the algorithm, as they will never receive any messages.
                2. In the second superstep, the received messages produce that nodes 0, 1, 2, 3, 5, 6
                   and 10 change to not accepted, as they receive messages from attacking ac-
                   cepted nodes that defeat them, so each one of these nodes (except 0 and 3)
                   will send a message with value (node.weight(), −1). Nodes 2, 3, 5, 6 and 10
                   will remain not accepted for the rest of the execution of the algorithm.
                3. In the third superstep, the messages sent in the previous superstep produce
                   that nodes 0 and 1 change to accepted, and so node 1 will send a message
                   to node 0 with value (1.weight(), +1). Node 1 will remain accepted for the
                   rest of the execution.
                4. Finally, in the fourth superstep the message received by node 0 from 1, as 1
                   is a defeater for 0, will change the state of 0 to rejected, being this its final
                   state and the end of the execution of the algorithm as no more messages are
                   sent by any nodes.

               Observe that the number of supersteps is equal to the maximum path length of
               the discussion graph, and although one can think about variants of this algorithm
               where less messages are sent, it seems that in the worst case it is not possible the
               have less supersteps than the maximum path length of the discussion graph. So,
               for typical Twitter discussions, where one has many branches in the discussion
               graph but not too deep, this algorithm may solve big discussions.
                   We have started to evaluate the performance of this algorithm on real Twit-
               ter discussions of different sizes, working with a small Spark cluster with five
               computers working with Linux. A table with preliminary results for a test set
               of Twitter discussions can be found at the URL: https://www.dropbox.com/
               s/u4yrz5an78d6dfw/lasttable.pdf?dl=0.


                                                        54
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


               5    Conclusions and future work

               In this paper we introduce a distributed system for mining the set of globally
               accepted tweets of a Twitter discussion. We model discussions with a weighted
               argumentation graph, where each node denotes a tweet, each edge denotes a
               criticism relationship between a pair of tweets of the discussion and each node
               is attached with a weight, that denotes the social relevance of the corresponding
               tweet in the discussion and it is computed from some tweet’s attributes, such as
               the number of followers of the author, the number of retweets and the number
               of favorites.
                   The set of accepted tweets is defined following an skeptical approach based on
               the ideal semantics of a valued argumentation framework. The ideal semantics
               for valued argumentation guarantees that the set of tweets in the solution is the
               maximal set of tweets that satisfies that it is consistent, in the sense that there
               are no defeaters among them, and that all of the tweets outside the solution are
               defeated by a tweet within the solution.
                   Our distributed strategy is based on the distributed computation model of
               Pregel which is appropriate for our problem, because the input for a Pregel
               algorithm is a directed graph, where the nodes can be in different states, and
               the goal is to compute the state of each node based on the state of the nodes’
               neighbors.
                   In our system each node (tweet) can be in two states, accepted or not ac-
               cepted (rejected), and the algorithm works only with discussion graphs that are
               acyclic, such that it allows us to solve discussions of big size with an efficient
               polynomial time distributed algorithm, where the state of each node depends
               only on the state of its attacking nodes. As far as we know, the system is the
               first distributed implementation of an argumentative reasoning algorithm for
               social network analysis.
                   As future work, we plan to extend the representation model to consider sup-
               port relationships between tweets and also to explore an efficient implementation
               of a distributed algorithm for bipartite graphs.


               References
                1. Alsinet, T., Argelich, J., Béjar, R., Fernández, C., Mateu, C., Planes, J.: Weighted
                   argumentation for analysis of discussions in Twitter. Int. J. Approx. Reasoning 85,
                   21–35 (2017)
                2. Bench-Capon, T.J.M.: Persuasion in practical argument using value-based argu-
                   mentation frameworks. Journal of Logic and Computation 13(3), 429–448 (2003)
                3. Bench-Capon, T.J.M., Dunne, P.E.: Argumentation in artificial intelligence. Artif.
                   Intell. 171(10-15), 619–641 (2007)
                4. Besnard, P., Hunter, A.: A logic-based theory of deductive arguments. Artif. Intell.
                   128(1-2), 203–235 (2001)
                5. Bild, D.R., Liu, Y., Dick, R.P., Mao, Z.M., Wallach, D.S.: Aggregate characteri-
                   zation of user behavior in Twitter and analysis of the retweet graph. ACM Trans.
                   Internet Techn. 15(1), 4:1–4:24 (2015)


                                                          55
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


                6. Bird, S.: NLTK: the natural language toolkit. In: Proceedings of the 21st Inter-
                   national Conference on Computational Linguistics and 44th Annual Meeting of
                   the Association for Computational Linguistics, ACL 2006, Sydney, Australia. pp.
                   17–21 (2006)
                7. Dung, P.M.: On the acceptability of arguments and its fundamental role in non-
                   monotonic reasoning, logic programming and n-person games. Artificial Intelligence
                   77(2), 321 – 357 (1995)
                8. Dung, P.M., Mancarella, P., Toni, F.: Computing ideal sceptical argumentation.
                   Artif. Intell. 171(10-15), 642–674 (2007)
                9. Dunne, P.E.: The computational complexity of ideal semantics I: abstract argu-
                   mentation frameworks. In: Proceedings of Computational Models of Argument,
                   COMMA 2008, Toulouse, France. pp. 147–158 (2008)
               10. Egly, U., Gaggl, S.A., Woltran, S.: Aspartix: Implementing argumentation frame-
                   works using answer-set programming. In: Proceedings of the 24th International
                   Conference on Logic Programming, ICLP 2008. pp. 734–738 (2008)
               11. Faber, W., Woltran, S.: Manifold answer-set programs for meta-reasoning. In: Pro-
                   ceedings of Logic Programming and Nonmonotonic Reasoning, LPNMR 2009. pp.
                   115–128 (2009)
               12. Hansen, L.K., Arvidsson, A., Nielsen, F.A., Colleoni, E., Etter, M.: Good friends,
                   bad news - affect and virality in Twitter. In: International Workshop on Social
                   Computing, Network, and Services (SocialComNet 2011) (2011)
               13. Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N.,
                   Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings
                   of the ACM SIGMOD International Conference on Management of Data, SIGMOD
                   2010. pp. 135–146 (2010)
               14. Nielsen, F.A.: A new anew: Evaluation of a word list for sentiment analysis in
                   microblogs. In: Proceedings of the ESWC2011 Workshop on ’Making Sense of Mi-
                   croposts’. pp. 93–98 (2011)
               15. Rahwan, I., Simari, G.R.: Argumentation in Artificial Intelligence. Springer Pub-
                   lishing Company, 1st edn. (2009)
               16. Valiant, L.G.: A bridging model for multi-core computing. J. Comput. Syst. Sci.
                   77(1), 154–166 (2011)


                                                        56

</pre>