=Paper=
{{Paper
|id=Vol-2048/paper10
|storemode=property
|title=Detecting Type of Persuasion : Is there Structure in Persuasion Tactics?
|pdfUrl=https://ceur-ws.org/Vol-2048/paper10.pdf
|volume=Vol-2048
|authors=Rahul R. Iyer,Katia Sycara,Yuezhang Li
|dblpUrl=https://dblp.org/rec/conf/icail/IyerSL17
}}
==Detecting Type of Persuasion : Is there Structure in Persuasion Tactics?==
Detecting type of Persuasion : Is there structure in persuasion tactics? Rahul R Iyer Katia Sycara Yuezhang Li Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University 5000 Forbes Avenue 5000 Forbes Avenue 5000 Forbes Avenue Pittsburgh, PA 15213 Pittsburgh, PA 15213 Pittsburgh, PA 15213 rahuli@andrew.cmu.edu katia@cs.cmu.edu yuezhanl@andrew.cmu.edu ABSTRACT to various domains such as politics, blogs, supreme court Existing works on detecting persuasion in text make use arguments etc., with very minimal changes (unlike vector- of lexical features for detecting persuasive tactics, without embedding models and other models that make use of lexical taking advantage of the possible structures inherent in the features, because they are dependent on the vocabulary). tactics used. In this paper, we propose a multi-class classifica- We compare our proposed approach with existing methods tion, unsupervised domain-independent model for detecting that use lexical features, and also some vector embedding the type of persuasion used in text, that makes use of the models, such as Doc2Vec [10]. sentence structure inherent in the different persuasion tactics. We had an intuition about arguments in the persuasive Our work shows promising results as compared to existing space having similar sentential structures because we had work, and vector-embedding models. seen and observed a few examples. Consider two examples in the Reason category: 1a) Are we to stoop to their level just KEYWORDS because of this argument?, 1b) I am angry at myself because I did nothing to prevent this, and two examples in the Scarcity persuasion detection, multi-class classification, text mining, category: 2a) Their relationship is not something you see unsupervised learning everyday, 2b) It is only going to go downhill from here. As we can see, there is a pattern in the structure between arguments 1 INTRODUCTION in the same category, and there is a structural difference across these two categories. This led us to investigate the Persuasion is being used at every type of forum these days, problem further and hypothesize our claim. from politics and military to social media. Detecting per- The rest of the paper is organized as follows: Section 1.1 suasion in text helps address many challenging problems: talks about the related work that has been done in the area, Analyzing chat forums to find grooming attempts of sexual section 2 explains the problem that we are trying to tackle, predators; training salespeople and negotiators; and develop- section 3 gives descriptions about the different datasets that ing automated sales-support systems. Furthermore, ability have been used in the paper, section 4 explains the proposed to detect persuasion tactics in social flows such as SMS and model and the baselines, section 5 discusses the experimental chat forums can enable targeted and relevant advertising. results obtained, section 6 goes over some brief applications of Additionally, persuasion detection is very useful in detecting the model, and section 7 concludes the paper with a discussion spam campaigns and promotions on social media: especially and future work. those that relate to terrorism. Persuasion identification is also potentially applicable to broader analyses of interaction, such as the discovery of those who shape opinion or the cohesiveness and/or openness of a social group. 1.1 Related Work Existing work on detecting persuasion in text, focuses There has been some work in the literature on detection of mainly on lexical features without taking advantage of the persuasion in texts. In [22], Young et al. present a corpus inherent structure present in persuasion tactics. In this work, for persuasion detection, which is derived from hostage ne- we attempt to build an unsupervised, domain-independent gotiation transcripts. The corpus is called “NPS Persuasion model for detecting persuasion tactics, that relies on the Corpus”, consists of 37 transcripts from four sets of hostage sentence structures of the tactics. Our contributions to the negotiation transcriptions. Cialdini’s model [5] was used to literature are: 1) we show that persuasive tactics have inherent hand-annotate each utterance in the corpus. There were sentential structures that can be exploited, 2) we propose an nine categories of persuasion used: reciprocity, commitment, unsupervised approach that does not require annotated data, consistency, liking, authority, social proof, scarcity, other, 3) we propose a way to synthesize prototype strings for the and non-persuasive. Then algorithms like Naive Bayes, SVM, different persuasion tactics, 4) our approach takes much less Maximum Entropy were used for the classification. time to execute as compared to models that require training; Gilbert [8] presented an annotation scheme for a persua- for example, our approach is faster than Doc2Vec by a factor sion corpus. A pilot application of this scheme showed some of almost 1.5, 5) our approach is domain-independent, in agreement between annotators, but not a very strong one. that it is independent of the vocabulary and can be applied After revising the annotation scheme, a more extensive study 54 18th Workshop on Computational Models of Natural Argument Floris Bex, Floriana Grasso, Nancy Green (eds) 16th July 2017, London, UK showed significant agreement between annotators. The au- between Palestinian authors and Israeli authors who had thors in [14], determined that it is possible to automatically written about the same topic. Bikel and Soren used machine detect persuasion in conversations using three traditional learning techniques to differentiate between differing opinions machine learning techniques, naive bayes, maximum entropy, [2]. They report an accuracy of 89% when distinguishing and support vector machine. Anand et al. [1] describe the between 1-star and 5-star consumer reviews, using only lexical development of a corpus of blog posts that are annotated features. for the presence of attempts to persuade and correspond- ing tactics employed in persuasive messages. The authors 2 PROBLEM FORMULATION make use of lexical features like unigrams, topic features 2.1 Preliminaries from LDA, and List count features from the Linguistic In- quiry and Word Count [7], and also the tactics themselves, Persuasion is an attempt to influence a person’s beliefs, atti- which are provided by human annotators. Tactics represent tudes, intentions, motivations, or behaviors. In persuasion, the type of persuasion being employed: social generalization, one party (the ‘persuader’) induces a particular kind of men- threat/promise, moral appeal etc. Carlo et al. [18] analyze tal state in another party (the ‘persuadee’), like flattery political speeches and use it in a machine learning framework or threats, but unlike expressions of sentiment, persuasion to classify the transcripts of the political discourses, accord- also involves the potential change in the mental state of the ing to their persuasive power, and predicting the sentences other party. Contemporary psychology and communication that trigger applause in the audience. In [19], Tan et al. look science further require the persuader to be acting intention- at the wordings of a tweet to determine its popularity, as ally. Correspondingly, any instance of (successful) persuasion opposed to the general notion of author/topic popularity. is composed of two events: (a) an attempt by the persuader, The computational methods they propose perform better which we term the persuasive act, and (b) subsequent uptake than an average human. In [12], Lukin et al. determine the by the persuadee. In this work, we consider (a) only, the effectiveness of a persuasive argument based on the audience different persuasive acts, and how to detect them. Working reaction. They report a set of experiments testing at large with (b) is a whole other problem. Throughout the rest of scale how audience variables interact with argument style to the paper, when we say persuasive arguments, we mean the affect the persuasiveness of an argument. former without taking the effectiveness of the persuasion into In addition to text, there has been some work on persua- account. We are only interested in whether the arguments sion in the multimedia domain. In [17], Siddiquie et al. work contain persuasion, and if so, the type. on the task of automatically classifying politically persuasive videos, and propose a multi-modal approach for the task. 2.2 Problem Statement They extract audio, visual and textual features that attempt The main objective of our work is to detect whether a given to capture affect and semantics in the audio-visual content piece of text contains persuasion or not. If it does, then we and sentiment in the viewers’ comments. They work on each can look into the type of persuasion strategy being used, such of these modalities separately and show that combining all as threat/promise, outcome, reciprocity etc. In this paper we of them works best. For the experiments, they use Rallying a look at 14 different persuasion strategies. These are listed in Crowd (RAC) dataset, which consists of over 230 videos from Table 1. These are the common tactics for persuasive acts YouTube, comprising over 27 hours of content. Chatterjee et contributed by Marwell and Scmitt [13], Cialdini [5], as well al. [4] aim to detect persuasiveness in videos by analysis of as argumentative patterns inspired by Walton et al. [21]. The the speaker’s verbal behavior, specifically based on his lexical intuition behind this investigation along with some examples usage and paraverbal markers of hesitation (a speaker’s stut- was discussed in section 1. tering or breaking his/her speech with filled pauses, such as It has to be noted that an entire text is deemed to contain um and uh. Paraverbal markers of hesitation have been found persuasion, if it includes a few arguments that use some to influence how other people perceive the speaker’s persua- of these tactics to persuade. So, it is important to extract siveness. The analysis is performed on a multimedia corpus of such arguments from the text, before applying the persuasion 1000 movie review videos annotated for persuasiveness. Park model to them. So, our approach has two steps: 1) a very et al. collected and annotated a corpus of movie review videos simple argument extractor model, to extract arguments from in [15]. From this data, they demonstrate that the verbal a given piece of text, and 2) the output of the extractor is and non-verbal behavior of the presenter is predictive of how fed into the persuasion detection model, which classifies the persuasive they are as well as predictive of the cooperative arguments into the different tactic classes. This process is nature of a dyadic interaction. represented in the flowchart shown in Figure 1. It has to Tasks similar to persuasion detection have been explored, be noted that, in this work, we are not concerned with the such as sentiment detection and perspective detection. Lin effectiveness of the persuasion. et al. investigated the idea of perspective identification at the sentence and document level [11]. Using the articles from 3 DATASETS the bitterlemons website1 , they were able to discriminate We have used datasets from different domains for the exper- iments to test the robustness of our model. Below are the 1 http://www.bitterlemons.org datasets used for persuasion detection: 2 18th Workshop on Computational Models of Natural Argument 55 Floris Bex, Floriana Grasso, Nancy Green (eds) 16th July 2017, London, UK Figure 1: The outline of the problem considered Broad Categories Outcomes Generalizations External Interpersonal Other Outcome. Mentions some Good/Bad Traits. As- VIP. Appeals to author- Favors/Debts. Mentions Recharacterization. particular consequences sociates the intended men- ity (bosses, experts, trend- returning a favor or injury Reframes an issue by from uptake or failure to tal state with a “good” or setters) analogy or metaphor. uptake “bad” person’s traits. Social Esteem. States Deontic/Moral Ap- Popularity. Invokes pop- Consistency. Mentions Reasoning. Provides a that people the persuadee peal. Mentions duties ular opinion as support for keeping promises or com- justification for an argu- values will think more or obligations, moral uptake mitments mentative point based upon highly of them additional argumentation goodness, badness schemes e.g., causal reason- ing, arguments from absur- dity Threat/Promise. Poses Empathy. Attempts to a direct threat or promise make the persuadee connect to the persuadee with someone else’s emo- tional perspective Self-Feeling. States that Scarcity. Mentions rarity, uptake will result in a bet- urgency, or opportunity of ter self-valuation by the per- some outcome suadee Table 1: List of Persuasion Tactics Used, Taken from [1]. We do not consider the broad categories in our experiments, and only work with the finer categories. 1. ChangeMyView, an active community on Reddit, pro- each of these tiles are annotated with a persuasive tactic (if vides a platform where users present their own opinions and present). reasoning, invite others to contest them, and acknowledge 4. Political Speeches: We collected a number of speeches when the ensuing discussions change their original views. The of Donald Trump and Hillary Clinton, to analyze the kinds training data period is: 2013/01/01 - 2015/05/07, and the of persuasive tactics they use. test data period is: 2015/05/08 - 2015/09/01. The training To train the argumentation extraction model, we use an dataset contains 3456 posts and the holdout dataset contains Argumentation Essay Dataset2 : This consists of about 807 posts. The dataset is organized as follows: each post 402 essays. There are two files for each essay - the original that is written by a user who wants his views changed, has essay and the annotated file. Annotations include breaking two argumentative threads – one that is successful and one down the essay into different components: claim, premise, that is not. This has been used to determine the persuasion stance. This can be used to train a simple classifier, to identify strategies employed by the successful thread. This dataset is arguments from text passages. used in [20]. The Blog Authorship Corpus is already annotated, as 2. Supreme Court Dialogs Corpus: This corpus con- noted above. In order to have the ground truth, i.e. the tains a collection of conversations from the U.S. Supreme annotations, for the other datasets, we needed to annotate Court Oral Arguments. 1) 51,498 utterances making up the arguments of the corpora with the persuasion tactics 50,389 conversational exchanges, 2) from 204 cases involving mentioned in Table 1. For this, we used Amazon Mechanical 11 Justices and 311 other participants 3) metadata like case- Turk3 . Using the argument extraction model, we extracted outcome, vote of the Justice, gender annotation etc. This arguments from all of the corpora combined (excluding the dataset is used in [6]. blog authorship corpus)4 . We had each argument annotated 3. Blog Authorship Corpora: This dataset, contributed by two different turkers, and the turkers were given the by Pranav Anand et al. and used in [1], is a subset of the freedom to classify a piece of text as either a non-argument blog authorship corpus. Each directory corresponds to a blog. or as one of the tactics from Table 1. There was about Each blog has sub-directories corresponding to a day. Inside 65% inter-annotator agreement, between the turkers and the the day sub-directory, there may be multiple posts; posts are identified with an underscore followed by a number. Out of 2 https://www.ukp.tu-darmstadt.de/data/argumentation-mining/ around 25048 posts, only around 457 were annotated with argument-annotated-essays-version-2/ 3 https://www.mturk.com/mturk/ persuasive acts. Each blog post has been broken down into 4 The whole dataset, along with the annotation guidelines, classification different “text tiles”, which are a few sentences long, and criteria and the prototype strings (both median and synthetic) for all the persuasion tactics, can be found at https://github.com/rrahul15/ Persuasion-Dataset 3 56 18th Workshop on Computational Models of Natural Argument Floris Bex, Floriana Grasso, Nancy Green (eds) 16th July 2017, London, UK conflicts were resolved manually. After this, we had a total (3) We also compare our approach with that proposed by of 1457 persuasive arguments from all the datasets combined. [1]. Here, the authors make use of different features The distribution of arguments from the different datasets is to account for fewer word-dependent features a) 25 given in Table 2. The guidelines for annotation were built on topic features, which were extracted using Latent the ones provided in [1], with some changes4 . Dirichlet Allocation (LDA) [3], with a symmetric Dirichlet prior b) 14 Tactic count features, i.e., a vector consisting of the count of the tactics. Naive Dataset # Arguments Bayes was used for the classification, to assess the degree to which these feature sets complement each ChangeMyView 362 other. Supreme Court 440 Political Speeches 198 Blog 457 4.2 Proposed Approach Table 2: Distribution of arguments from the different In this section, we describe the proposed unsupervised, domain- datasets independent approach to identify the persuasion tactics in a given set of arguments. By domain-independence, we mean that our proposed model is robust across different genres, be it political speeches or blogs, and this is important because different domains might have their own vocabulary. Before 4 TECHNICAL APPROACH heading into the details of the algorithm, we present a few In this section, we describe our proposed models, along with a useful definitions. couple of baselines for comparison. We describe the baselines in the following section. It has to be kept in mind, as noted in 4.2.1 Preliminaries. In this subsection, we describe a few section 3, that the dataset used consists solely of persuasive preliminary concepts arguments. Parse Tree: A parse tree is an ordered, rooted tree that 4.1 Baselines represents the syntactic structure of a sentence, according to Here, we discuss the different baselines that we use for com- some context-free grammar. It captures the sentence struc- parison. We describe a simple supervised approach that makes ture: multiple types of sentences can have a similar sentence use of lexical features and then move onto more complicated structure, even if their vocabularies are not the same. This models involving vector-embedding. In all the supervised is the essence of the approach. approaches, we use a 80 : 20 split for training and testing. (1) Simple Supervised: Here, the learning phase in- Edit Distance: Edit distance is a way of quantifying how volved extracting simple textual features from the dissimilar two strings are to one another by counting the training set: unigrams, bigrams, without punctu- minimum number of operations required to transform one ation, and then training an SVM (Support Vector string into the other. Given two strings a, and b on an alpha- Machine) model, using Sequential Minimal Optimiza- bet ⌃, the edit distance d(a, b) is the minimum number of tion (SMO) [16], to learn a model from these features edit operations that transforms a into b. The different edit that could be applied to the holdout set. This model operations are: 1) Insertion of a single symbol, 2) Deletion was then used to test the remaining posts. of a single symbol, 3) Substitution of a single symbol, for (2) Supervised Document Vectors: This method uses another. the Doc2Vec model proposed by Quoc and Mikolov [10]. First the arguments were separated into differ- Median String: The median string of a set of strings is ent categories based on the persuasion tactic. Then, defined to be that element in the set which has the smallest the Doc2Vec model was applied, to each such cluster, sum of distances from all the other elements [9]. In our case, to embed all the arguments into vectors. The proto- the distance between strings is the edit distance. type vector for each category was then chosen as the mean of all the vectors in that category. To classify 4.2.2 Parse-Tree Model. We are proposing a domain-independent the holdout set, one would compute the vector of classification. The idea is that persuasive arguments may have the argument in consideration and then compute certain characteristic sentence structures, which we might the similarity (cosine) to the prototype vectors. The be able to exploit. The training and testing phase are given category which has the highest similarity is the one below: that is chosen. The cosine similarity between two vectors a and Training Phase b is defined as follows: (1) As mentioned earlier, we have 14 different categories a·b for persuasive tactics. We obtain one representative similarity = (1) kakkbk prototype argument for each category. We obtain 4 18th Workshop on Computational Models of Natural Argument 57 Floris Bex, Floriana Grasso, Nancy Green (eds) 16th July 2017, London, UK these in two different ways, which we discuss after performance across successive trials stabilizes as we detailing the algorithm. increase the set size. The performance is best when (2) We then perform phrase-structure parsing on each we consider all the arguments, and is quite stable of these prototype arguments to obtain their parse- and close to the best when we consider 30% of all trees, which gives the structure of the argument. the arguments. So, we settle for 30% as the ideal set (3) These parse trees are then converted into parse size because we get a stable performance with a very strings, keeping the structure intact, and the leaf small loss in accuracy and much fewer arguments. nodes (the terminal symbols, namely words) are re- Examples of the prototype parse strings, using the moved, to get a domain independent representation median method, for two different persuasion tactics: of the structure of the argument. Reasoning and Scarcity, are given in Figure 7. We (4) By now, we have representative prototype parse only display prototypes for two of the tactics for strings for each persuasive tactic, i.e. 14 different pro- purposes of brevity4 . totype parse strings. We use these strings to classify Now, although different arguments in the same a new argument into one of the persuasive categories. category are structurally similar, they may each have As mentioned earlier, every instance in the dataset certain parts in their structure that capture the is a persuasive argument. This can be construed as essence of that category much better. We rule out the “training phase”. taking advantage of these individual segments, when Testing Phase we pick one median argument out of the set. This led us to the second method of obtaining prototype (1) For the testing phase, we have to classify a new strings. argument into one of the categories. Since, each new (2) Synthetic Prototype: We noted earlier that there argument in the dataset is persuasive, we don’t have could be certain segments in different arguments of to worry about non-arguments. We build a model to the same category, that capture the essence of the account for non-arguments in section 6. category better. To accommodate this, we chop up (2) Given a new argument, compute its parse string, the different arguments in a set into a number of similar to the procedure used in obtaining the parse segments and choose different segments to synthesize strings for the prototype arguments. an artificial prototype string. To obtain the best (3) To then classify this argument, we compute the nor- ith segment for the synthetic string, we choose the malized edit distances (normalized by the lengths of median of the ith segments for all strings in the the strings) between its parse string and the proto- set. It has to be noted that we chop the strings type parse strings of each category. uniformly. This process of synthesizing the prototype (4) The persuasive category with the least edit distance string is illustrated in Figures 4 and 5. As before, we is logically the most structurally similar to the given need some parameters to tune here. In addition to argument, and hence the argument is classified into the optimal set size, we also need to determine the that category. optimal number of segments. (5) This process is explained in the flowchart, given in In order to determine the optimal number of seg- Figure 2. ments and set size, we conduct additional experi- Choosing the Prototype Strings We propose two meth- ments on the supreme court dataset, with different ods to obtain the prototype argument strings. parameter values to observe the trend as before. We (1) Median as the Prototype: Take a set of arguments choose different set sizes: 2%, 5%, 10%, 20%, 30% from each persuasion category and obtain the proto- and All, as before, and different number of segments: type string for that category as the median string of 2, 3, 5, 7, and 9. For each (set size, number of seg- the set. We now have to determine the ideal size of ment) pair, we conduct 5 trails, choosing a random the set in question. For obvious reasons, we get the sample for the sets each time, and compute the av- best representation if we consider all arguments of erage performance. This trend is shown in Figure 6. that category, but this would require a completely We do not show the performance for different trials annotated dataset (making the model supervised). as before, rather just the average across the trials. In order to determine the ideal set size, we con- We see that the trend stabilizes as we increase the duct additional experiments on a particular dataset, set size, as before, and the accuracy improves as the supreme court dataset, with different parameter we consider more number of segments. But here, it values to observe the trend of the performance. We is a tradeoff between accuracy and speed because choose different set sizes: 2%, 5%, 10%, 20%, 30% and having a large number of segments will require us to All (the set sizes chosen is a percentage of the total compute the median for every segment. We settle for number of arguments in that category). For each set 30% as the ideal set size and 9 as the ideal number of size, we conduct 5 different trials, choosing a random segments. The prototype strings using this method sample each time, to see the average performance. This trend is shown in Figure 3. As we can see, the 5 58 18th Workshop on Computational Models of Natural Argument Floris Bex, Floriana Grasso, Nancy Green (eds) 16th July 2017, London, UK are not parse strings of meaningful sentences and so 5.1 Evaluation Metrics we do not display them here. The metrics used for evaluation are listed below. (1) Precision: The percentage of arguments the system identified as having a particular tactic that in fact had that tactic Precisiont = |{retrieved documents}| \ |{relevant documents}| |{retrieved documents}| (2) where P recisiont is the precision for tactic t. (2) Recall: the percentage of arguments of a particular tactic that the system classified into that category Recallt = |{retrieved documents}| \ |{relevant documents}| |{relevant documents}| (3) where Recall is the recall for tactic t. (3) F1 -measure: the harmonic mean of precision and recall Figure 3: Trend Graph for Median as the Prototype string 2 ⇥ precisiont ⇥ recallt F 1t = (4) precisiont + recallt where F1t is the F1 measure for tactic t. It is important to note that, the precision, recall and F1 measure are computed for each persuasion tactic separately, akin to a binary classifier. We report the mean of these measures, over all the tactics, in our experiments. 5.2 Results First, we run the proposed parse-tree model on the arguments Figure 5: Synthesis of the prototype string. The extracted from the datasets and obtain the average per- coloured segments are the medians of those string category accuracy. The per-category accuracy is defined as the segments. In this case, the median of each segment percentage of accurate classifications for a specific category. is coming from a different string. This is just for illus- The categories are the persuasion tactics in our case. For trative purposes, it need not always be the case. this task, we combined the arguments from each of the 4 datasets to form a combined set, in order to get an average performance estimate (refer to Table 2 for the distribution of arguments in each dataset). We classified the arguments in the combined set and calculated the fraction of correct classifications for each category. The results are given in Table 3. We do not consider the broad categories in our experiments, and only work with the finer categories. We also compute the distribution of the tactics in the different datasets. We do this by classifying the arguments in each of the 4 datasets, and calculating the frequency of appearance of each tactic in the corpus as a percentage over all the arguments in that corpus. These are listed in Tables 4-7. The ranking of the tactics in these tables, with respect to the percentages, aligns closely with manual evaluations. These distributions are shown just to give an idea of the Figure 6: Trend Graph for Synthetic Prototype string ranking of the tactics, as predicted by the algorithm (which makes sense intuitively). ChangeMyView: Each user posts his/her stance on a particular topic and challenges others to change his opinion. 5 EXPERIMENTAL RESULTS For example, one of the posts was about a man who did not In this section, we present the results of our proposed model believe in essential-oils and believed that they were destruc- and compare it with the different approaches described earlier. tive, whereas his wife believed the oils were beneficial. He In the following section, we discuss the metrics that we have requested the other users to make him change his mind about used for evaluation. essential oils by giving him sufficient evidence. If a person is 6 18th Workshop on Computational Models of Natural Argument 59 Floris Bex, Floriana Grasso, Nancy Green (eds) 16th July 2017, London, UK Figure 2: The parse-tree model explained Figure 4: Outline for Synthesis of the Prototype strings (a) Parse Tree for Scarcity Sentence: Their relationship is not something you see (b) Parse Tree for Reasoning everyday Sentence: I’m angry because of this, I did NOTHING Parse String: (NP+SBAR+S (S (NP (PRP$) (NN)) (VP Parse String: (SBAR+S (NP (PRP)) (VP (VBP) (VB) (SBAR (VBZ) (RB) (NP (NN)))) (S (NP (PRP)) (VP (VB) (NP (DT) (IN) (S (PP (IN) (NP (DT))) (,) (NP (PRP)) (VP (VBD) (NN))))) (ADJP (JJ))))))) Figure 7: Examples of parse trees with the parse-strings for 2 persuasion tactics: Scarcity and Reason. This is just an illustration to show how the parse strings look like. and one that is unsuccessful. We analyzed the persuasive Category Accuracy strategies that are used by the successful threads because Reasoning 79.8 these would be examples of good uses of the different persua- Deontic/Moral Appeal 69.6 sion tactics. For our purposes of classifying tactics, we could Outcome 65.7 have also used the unsuccessful threads (we are not concerned Empathy 61.3 about the uptake of the persuasion by the persuadee) but Threat/Promise 58.2 we chose not to. Firstly, we extracted the positive comments Popularity 56.4 from the threads (those which were given a delta by the OP). Recharacterization 54.9 We then applied the parse-tree persuasion model that we VIP 53.5 developed earlier, to these texts, to perform the classification. Social Esteem 50.3 Many of the comments had links to other credible sources Consistency 45.6 which listed facts that were opposed to the OP’s view. We did Favors/Debts 41.1 not venture into these links. After determining the persuasion Self-Feeling 37.7 strategies used in the comments, we observed that Reasoning Good/Bad Traits 35.5 and Outcomes were the most frequently used strategies. A Scarcity 29.6 more detailed distribution of tactics is given in Table 4. Table 3: Per-category Accuracy for the parse-tree model, when 14 categories are used Tactic Percentage Reasoning 40.7 Outcomes 41.2 successful in changing the mind of the OP (Original Poster), Good/Bad traits 10.0 the OP gives that person a delta in their comments. All the Social 8.1 conversations are monitored by Reddit and hence the quality is high. Table 4: Distribution of Persuasion Tactics used in the ChangeMyView dataset. In our dataset, for each post, there are two threads of comments – one successful in changing the mind of the OP 7 60 18th Workshop on Computational Models of Natural Argument Floris Bex, Floriana Grasso, Nancy Green (eds) 16th July 2017, London, UK Supreme Court Dataset: This dataset includes the tran- script of the conversation exchanges over 204 cases, along Tactic Percentage with the outcome of the cases. The outcome could either be Empathy 35.2 Respondent or Petitioner. Petitioner is the person who files Consistency 33.8 the petition/case against a particular party requesting action Favors/Debts 18.2 on a certain matter, and the respondent is the person against Social 12.8 whom the said relief is sought. We have collected all the cases Table 7: Distribution of Persuasion Tactics used in where the petitioner has won and analyzed the argument Hillary’s Speeches. structure. Tactic Percentage Deontic Appeal 33.3 performances are measured by the precision (P), recall (R) Reasoning 35.5 and the F1 measure (F), as described earlier. Recharacterization 12.6 As can be seen, the domain-independent parse-tree model Outcome 8.6 with synthetic prototype strings performs the best, almost Empathy 5.2 7 8% better than Doc2Vec. Thus, our intuition that different VIP 4.8 segments of arguments in the same category capture the Table 5: Distribution of Persuasion Tactics used in the essence of the category better than others, is validated. It Supreme Court dataset. has to be noted that in a multi-class classification setting, the F1 scores, obtained in Table 8, are reasonable. It also has to be noted that our model performs faster than Doc2Vec, by a factor of almost 1.5. We have taken these cases and analyzed the arguments. Using the argumentation-model, we were able to identify the 5.3 Sensitivity Analysis key arguments and then using the parse-tree model, we were We also performed a sensitivity analysis on the parse-tree able to classify the type of argument that was used. It was model to observe its robustness. For this, we combined the found that most of the presented arguments were Deontic arguments from all the 4 datasets and ran the model on Appeal and Reasoning. The distribution of arguments is given the combined set: 1) using only 10 instances in the dataset, in Table 5. 2) using only 100 instances in the dataset, 3) using 1000 Political Speeches: We analyze the persuasive tactics instances in the dataset, and 4) using all the instances in the present in the speeches of political candidates, specifically dataset. The prototype argument strings, for this dataset, those of Donald Trump and Hillary Clinton. These distribu- were synthesized according to the method mentioned earlier. tions are given in Tables 6 and 7. It was observed that the The results are given in Table 10. For the cases which did most frequently used tactic by Trump was Outcome (“Make not involve the whole dataset, we randomly sampled 5 times America Great Again”), while for Hillary, the most frequently from the whole corpus and averaged the results. As we can used tactic is Empathy see, the results show that the proposed model is relatively robust and invariant with the amount of data. Tactic Percentage Another aspect of sensitivity analysis, involving variation of the parameters of the models was discussed earlier in Outcome 39.1 section 4.2.2. Principles 31.2 VIP 18.5 Reasoning 11.2 Metrics Table 6: Distribution of Persuasion Tactics used in Data P R F Trump’s Speeches. 10 0.370 0.375 0.372 100 0.402 0.393 0.397 1000 0.491 0.387 0.432 Finally, we present the results of the performance of the ALL 0.442 0.412 0.426 different algorithms, described earlier, in Table 8. We also performed these experiments in a binary setting: whether Table 10: Sensitivity Analysis of the parse-tree model a given argument contains persuasion or not. These results using 10, 100, 1000 and all instances of data. are presented in Table 9. We have run these experiments on the arguments extracted from each dataset (refer to Table 2 for the distribution of arguments in each dataset). The 8 18th Workshop on Computational Models of Natural Argument 61 Floris Bex, Floriana Grasso, Nancy Green (eds) 16th July 2017, London, UK 6 APPLICATIONS The persuasion-detection model that is proposed here, is very versatile and can be applied in many scenarios. A few applications are detailed below Basic Argument vs. Non-argument Classifier The parse-tree model proposed can be used as an argument classifier. To test its applicability here, we collected a set of simple sentences5 with less than 10 words. We collected a total of 1000 such simple sentences. It was our intuition that such simple phrases should have a very low similarity with the different persuasion categories (as they are struc- turally different). So, we needed to establish a threshold to Figure 8: Graph of Threshold vs. Accuracy (F1 Score) classify a particular piece of text as an argument versus a non-argument. If the normalized edit distance similarities between the given string and the prototypes of the different Political Speech Analysis categories are less than that threshold (all of them should be The parse-tree model can also be used to detect spam cam- less than the threshold), classify it as a non-argument. Else, paigns on social media, and to detect terrorist campaigns. classify it into its correct persuasion category. Choosing this We analyzed some speeches of Osama Bin Laden to see what threshold is not such an easy task for these reasons: kind of tactics he used to influence the people in his speeches. From the analysis, mostly Empathy was used. The detailed 1) Higher threshold: Lower chance of classifying a non-argument distribution is given in Table 11. as an argument and Higher chance of classifying an argument as a non-argument 2) Lower Threshold: Higher chance of classifying a non- Tactic Percentage argument as an argument and Lower chance of classifying an Empathy 29.7 argument as a non-argument Recharacterization 20.3 Reasoning 12.9 So, we needed to choose a threshold that is neither too Good/Bad Traits 20.2 high nor too low. We tried different thresholds, and we show Scarcity 8.0 a graph of threshold vs. accuracy in Figure 8. The accuracy Outcome 8.9 here is the mean F1 score. As we can see, the threshold of Table 11: Distribution of Persuasion Tactics used in 0.1 seems to work best. For this set threshold, the F1 score is Osama Bin Laden’s Speeches. observed to be 0.412. It can be seen that the performance of this system is not as good as with just persuasive arguments. It has to be noted here that this is not a binary problem: argument vs. non-argument. The F1 score presented here is 7 DISCUSSION for the problem of 14 persuasion tactic categories vs. non- arguments. We have proposed a fairly simple, domain-independent un- This model cannot be used, in this form, as a robust supervised model for detecting types of persuasion used in argument classifier yet because some non-arguments have text. This model can be used in any context/domain because structures similar to some of the persuasion tactics described it only uses the inherent structure of the persuasion tactics. earlier. For example, consider the sentence: “The men smoked This versatility gives it a variety of applications. Almost all and most of the women knitted while they talked”. Although persuasive arguments can be classified under the categories this is a non-argument, the model could confuse it with one mentioned earlier. It has to be noted that the reason we of the persuasion categories like Reason/Promise. It is for did not include lexical features to our model is because that this reason that we consider very simple, straight-forward would make the model slightly domain-dependent. It is for non-argument sentences of a few words. In order to build this reason, that we just focused on the structural aspects. Of a classifier with such capabilities, we would be required to course, as we mention in section 7.1, it might be possible to incorporate domain-independent lexical features to the parse include a few lexical terms like because, if, while etc., which tree model (more discussion in section 7.1). are domain-independent and tactic-dependent, to further strengthen our model. From the obtained results, we see that our model’s accuracy is highest for the following persuasion tactics (refer to Table 3): reasoning, deontic/moral appeal, outcome, empathy. This is in agreement with the observations made by the authors 5 http://www.cs.pomona.edu/~dkauchak/simplification/ in [1]. This further validates our model. 9 62 18th Workshop on Computational Models of Natural Argument Floris Bex, Floriana Grasso, Nancy Green (eds) 16th July 2017, London, UK From the metrics computed, we see that the performance Believe Me-We Can Do This! Annotating Persuasive Acts in Blog of the proposed model, with synthesized prototype strings, Text.. In Computational Models of Natural Argument. [2] Daniel M Bikel and Jeffrey Sorensen. 2007. If we want your is better than that of vector-embedding models, such as opinion. In International Conference on Semantic Computing Doc2Vec, which uses deep learning. There is almost a 7 8% (ICSC 2007). IEEE, 493–500. [3] David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent improvement in performance (compared to Doc2Vec), and dirichlet allocation. Journal of machine Learning research 3, this over 14 categories in total is a reasonable amount. This is Jan (2003), 993–1022. very interesting because: 1) The Doc2Vec model that we have [4] Moitreya Chatterjee, Sunghyun Park, Han Suk Shim, Kenji Sagae, and Louis-Philippe Morency. 2014. Verbal behaviors and persua- used as baseline uses all the data, whereas our model just siveness in online multimedia content. SocialNLP 2014 (2014), uses a very small subset of the dataset to compute the initial 50. set of 14 prototype argument strings, one for each category, [5] Robert B Cialdini. 2001. Influence: Science and practice. Boston: Allyn & Bacon (2001). 2) The Doc2Vec model requires a training phase which can [6] Cristian Danescu-Niculescu-Mizil, Lillian Lee, Bo Pang, and Jon take considerable amount of time, considering its complex Kleinberg. 2012. Echoes of power: Language effects and power differences in social interaction. In Proceedings of the 21st inter- structure. In addition, our model runs faster than Doc2Vec national conference on World Wide Web. ACM, 699–708. by a factor of almost 1.5, as already mentioned earlier. The [7] James W Pennebaker Martha E Francis and Roger J Booth. 1993. reason that our model beats lexical-features based methods Linguistic Inquiry and Word Count. Technical Report. Technical Report, Dallas, TX: Southern Methodist University. could be attributed to the fact that domain-words somehow [8] Henry T Gilbert. 2010. Persuasion detection in conversation. restrict the performance. Ph.D. Dissertation. Monterey, California. Naval Postgraduate We see that the baseline SVM, on lexical features had School. [9] Teuvo Kohonen. 1985. Median strings. Pattern Recognition considerably high precision but it fell short on recall. That Letters 3, 5 (1985), 309–313. is why we use the F1 measure, as a combination of both [10] Quoc V Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents.. In ICML, Vol. 14. 1188–1196. aspects of the model: precision and recall. The most suitable [11] Wei-Hao Lin, Theresa Wilson, Janyce Wiebe, and Alexander model should be the one with a high F1 score, which can be Hauptmann. 2006. Which side are you on?: identifying perspec- achieved with high values for both precision and recall. We tives at the document and sentence levels. In Proceedings of the Tenth Conference on Computational Natural Language Learn- also see that our model performs better, at both binary and ing. Association for Computational Linguistics, 109–116. multi-class classification, than the approach used in [1]. [12] Stephanie Lukin, Pranav Anand, Marilyn Walker, and Steve Sensitivity analysis was also done to test the robustness of Whittaker. 2017. Argument Strength is in the Eye of the Beholder: Audience Effects in Persuasion. (2017). the method. As can be seen from the results, the method is [13] Gerald Marwell and David R Schmitt. 1967. Dimensions of fairly stable and robust with respect to the size of the data. compliance-gaining behavior: An empirical analysis. Sociometry (1967), 350–364. The main takeaway here is that, complex methods like [14] Pedro Ortiz. 2010. Machine learning techniques for persua- neural networks may not be the best for all tasks. We have sion dectection in conversation. Ph.D. Dissertation. Monterey, shown that with very simple methods like the one we have California. Naval Postgraduate School. [15] Sunghyun Park, Han Suk Shim, Moitreya Chatterjee, Kenji Sagae, proposed, we are able to achieve a performance better than and Louis-Philippe Morency. 2014. Computational analysis of per- the methods discussed in the paper, while avoiding high suasiveness in social multimedia: A novel dataset and multimodal computational costs and the opacity of results of neural prediction approach. In Proceedings of the 16th International Conference on Multimodal Interaction. ACM, 50–57. computation based methods. [16] John Platt. 1998. Sequential minimal optimization: A fast algo- rithm for training support vector machines. (1998). 7.1 Future Work [17] Behjat Siddiquie, Dave Chisholm, and Ajay Divakaran. 2015. Exploiting Multimodal Affect and Semantics to Identify Politically There is scope for improving the proposed model further. As Persuasive Web Videos. In Proceedings of the 2015 ACM on International Conference on Multimodal Interaction. ACM, 203– of now, we just use the sentence structures of the different 210. persuasion tactics. We have not made use of the fact that [18] Carlo Strapparava, Marco Guerini, and Oliviero Stock. 2010. there could be some domain-independent words for each tac- Predicting Persuasiveness in Political Discourses.. In LREC. [19] Chenhao Tan, Lillian Lee, and Bo Pang. 2014. The effect of tic, like because, if, while etc. Incorporation of such keywords wording on message propagation: Topic-and author-controlled into the model could result in improved performance. We will natural experiments on Twitter. arXiv preprint arXiv:1405.1438 investigate this in the future. Additionally, we will use our (2014). [20] Chenhao Tan, Vlad Niculae, Cristian Danescu-Niculescu-Mizil, approach for different applications, such as detecting spam and Lillian Lee. 2016. Winning arguments: Interaction dynamics campaigns, measuring how effective a spam campaign can and persuasion strategies in good-faith online discussions. In Proceedings of the 25th International Conference on World be (combination of persuasiveness and connectivity in the Wide Web. International World Wide Web Conferences Steering network, which can be measured by PageRank), identifying Committee, 613–624. terrorist campaigns etc. [21] Douglas Walton, Christopher Reed, and Fabrizio Macagno. 2008. Argumentation schemes. Cambridge University Press. [22] Joel Young, Craig H Martell, Pranav Anand, Pedro Ortiz, 8 ACKNOWLEDGEMENTS Henry Tucker Gilbert IV, et al. 2011. A Microtext Corpus for Persuasion Detection in Dialog.. In Analyzing Microtext. This work has been funded by ARO award #W911NF-13-1- 0416. REFERENCES [1] Pranav Anand, Joseph King, Jordan L Boyd-Graber, Earl Wagner, Craig H Martell, Douglas W Oard, and Philip Resnik. 2011. 10 18th Workshop on Computational Models of Natural Argument 63 Floris Bex, Floriana Grasso, Nancy Green (eds) 16th July 2017, London, UK Blogs ChangeMyView Supreme Court Political Speeches Method P R F P R F P R F P R F SVM Baseline 0.594 0.132 0.216 0.511 0.107 0.176 0.605 0.051 0.094 0.454 0.038 0.070 NB+Tactic 0.361 0.483 0.413 0.309 0.465 0.371 0.319 0.511 0.393 0.267 0.417 0.325 NB+LDA 0.098 0.132 0.112 0.071 0.116 0.088 0.083 0.151 0.107 0.032 0.045 0.037 NB+Tactic+LDA 0.114 0.229 0.152 0.099 0.212 0.135 0.138 0.242 0.176 0.041 0.145 0.064 S-Doc2Vec 0.493 0.439 0.464 0.472 0.427 0.448 0.496 0.442 0.467 0.411 0.392 0.401 ParseTree 0.498 0.443 0.468 0.491 0.419 0.452 0.477 0.462 0.468 0.418 0.371 0.393 ParseTree+SP 0.531 0.470 0.498 0.539 0.448 0.489 0.521 0.483 0.501 0.464 0.405 0.432 Table 8: Comparison of results for the different methods, considering all the 14 tactics. Here, ParseTree is the first model proposed, ParseTree+SP stands for the parse-tree model with the synthetic prototype strings, NB stands for Naive Bayes, LDA stands for Latent Dirichlet Allocation, and S-Doc2Vec stands for the supervised version of the Doc2Vec method. NB+Tactic, NB+LDA, NB+Tactic+LDA are the features used by the authors in [1] Blogs ChangeMyView Supreme Court Political Speeches Method P R F P R F P R F P R F SVM Baseline 0.741 0.179 0.288 0.721 0.159 0.261 0.763 0.171 0.279 0.651 0.107 0.184 NB+Tactic 0.537 0.672 0.597 0.515 0.645 0.573 0.533 0.656 0.588 0.467 0.599 0.525 NB+LDA 0.133 0.285 0.181 0.111 0.262 0.156 0.137 0.282 0.184 0.051 0.203 0.082 NB+Tactic+LDA 0.169 0.437 0.244 0.144 0.416 0.214 0.154 0.441 0.228 0.097 0.354 0.152 S-Doc2Vec 0.732 0.599 0.659 0.705 0.589 0.642 0.723 0.627 0.672 0.648 0.533 0.585 ParseTree 0.701 0.603 0.648 0.692 0.589 0.636 0.716 0.614 0.661 0.633 0.539 0.582 ParseTree+SP 0.737 0.629 0.679 0.721 0.623 0.668 0.751 0.642 0.692 0.661 0.562 0.607 Table 9: Comparison of results for the different methods in a binary setting. Here, ParseTree is the first model proposed, ParseTree+SP stands for the parse-tree model with the synthetic prototype strings, NB stands for Naive Bayes, LDA stands for Latent Dirichlet Allocation, and S-Doc2Vec stands for the supervised version of the Doc2Vec method. NB+Tactic, NB+LDA, NB+Tactic+LDA are the features used by the authors in [1] 11 64 18th Workshop on Computational Models of Natural Argument Floris Bex, Floriana Grasso, Nancy Green (eds) 16th July 2017, London, UK