=Paper= {{Paper |id=Vol-2048/paper10 |storemode=property |title=Detecting Type of Persuasion : Is there Structure in Persuasion Tactics? |pdfUrl=https://ceur-ws.org/Vol-2048/paper10.pdf |volume=Vol-2048 |authors=Rahul R. Iyer,Katia Sycara,Yuezhang Li |dblpUrl=https://dblp.org/rec/conf/icail/IyerSL17 }} ==Detecting Type of Persuasion : Is there Structure in Persuasion Tactics?== https://ceur-ws.org/Vol-2048/paper10.pdf
           Detecting type of Persuasion : Is there structure in
                          persuasion tactics?
              Rahul R Iyer                                Katia Sycara                             Yuezhang Li
         Carnegie Mellon University                Carnegie Mellon University               Carnegie Mellon University
            5000 Forbes Avenue                        5000 Forbes Avenue                       5000 Forbes Avenue
           Pittsburgh, PA 15213                      Pittsburgh, PA 15213                     Pittsburgh, PA 15213
          rahuli@andrew.cmu.edu                        katia@cs.cmu.edu                     yuezhanl@andrew.cmu.edu
ABSTRACT                                                            to various domains such as politics, blogs, supreme court
Existing works on detecting persuasion in text make use             arguments etc., with very minimal changes (unlike vector-
of lexical features for detecting persuasive tactics, without       embedding models and other models that make use of lexical
taking advantage of the possible structures inherent in the         features, because they are dependent on the vocabulary).
tactics used. In this paper, we propose a multi-class classifica-      We compare our proposed approach with existing methods
tion, unsupervised domain-independent model for detecting           that use lexical features, and also some vector embedding
the type of persuasion used in text, that makes use of the          models, such as Doc2Vec [10].
sentence structure inherent in the different persuasion tactics.        We had an intuition about arguments in the persuasive
Our work shows promising results as compared to existing            space having similar sentential structures because we had
work, and vector-embedding models.                                  seen and observed a few examples. Consider two examples in
                                                                    the Reason category: 1a) Are we to stoop to their level just
KEYWORDS                                                            because of this argument?, 1b) I am angry at myself because I
                                                                    did nothing to prevent this, and two examples in the Scarcity
persuasion detection, multi-class classification, text mining,
                                                                    category: 2a) Their relationship is not something you see
unsupervised learning
                                                                    everyday, 2b) It is only going to go downhill from here. As we
                                                                    can see, there is a pattern in the structure between arguments
1    INTRODUCTION                                                   in the same category, and there is a structural difference
                                                                    across these two categories. This led us to investigate the
Persuasion is being used at every type of forum these days,
                                                                    problem further and hypothesize our claim.
from politics and military to social media. Detecting per-
                                                                       The rest of the paper is organized as follows: Section 1.1
suasion in text helps address many challenging problems:
                                                                    talks about the related work that has been done in the area,
Analyzing chat forums to find grooming attempts of sexual
                                                                    section 2 explains the problem that we are trying to tackle,
predators; training salespeople and negotiators; and develop-
                                                                    section 3 gives descriptions about the different datasets that
ing automated sales-support systems. Furthermore, ability
                                                                    have been used in the paper, section 4 explains the proposed
to detect persuasion tactics in social flows such as SMS and
                                                                    model and the baselines, section 5 discusses the experimental
chat forums can enable targeted and relevant advertising.
                                                                    results obtained, section 6 goes over some brief applications of
Additionally, persuasion detection is very useful in detecting
                                                                    the model, and section 7 concludes the paper with a discussion
spam campaigns and promotions on social media: especially
                                                                    and future work.
those that relate to terrorism. Persuasion identification is
also potentially applicable to broader analyses of interaction,
such as the discovery of those who shape opinion or the
cohesiveness and/or openness of a social group.                     1.1    Related Work
   Existing work on detecting persuasion in text, focuses           There has been some work in the literature on detection of
mainly on lexical features without taking advantage of the          persuasion in texts. In [22], Young et al. present a corpus
inherent structure present in persuasion tactics. In this work,     for persuasion detection, which is derived from hostage ne-
we attempt to build an unsupervised, domain-independent             gotiation transcripts. The corpus is called “NPS Persuasion
model for detecting persuasion tactics, that relies on the          Corpus”, consists of 37 transcripts from four sets of hostage
sentence structures of the tactics. Our contributions to the        negotiation transcriptions. Cialdini’s model [5] was used to
literature are: 1) we show that persuasive tactics have inherent    hand-annotate each utterance in the corpus. There were
sentential structures that can be exploited, 2) we propose an       nine categories of persuasion used: reciprocity, commitment,
unsupervised approach that does not require annotated data,         consistency, liking, authority, social proof, scarcity, other,
3) we propose a way to synthesize prototype strings for the         and non-persuasive. Then algorithms like Naive Bayes, SVM,
different persuasion tactics, 4) our approach takes much less        Maximum Entropy were used for the classification.
time to execute as compared to models that require training;           Gilbert [8] presented an annotation scheme for a persua-
for example, our approach is faster than Doc2Vec by a factor        sion corpus. A pilot application of this scheme showed some
of almost 1.5, 5) our approach is domain-independent, in            agreement between annotators, but not a very strong one.
that it is independent of the vocabulary and can be applied         After revising the annotation scheme, a more extensive study




    54                                                              18th Workshop on Computational Models of Natural Argument
                                                                                Floris Bex, Floriana Grasso, Nancy Green (eds)
                                                                                                    16th July 2017, London, UK
showed significant agreement between annotators. The au-               between Palestinian authors and Israeli authors who had
thors in [14], determined that it is possible to automatically         written about the same topic. Bikel and Soren used machine
detect persuasion in conversations using three traditional             learning techniques to differentiate between differing opinions
machine learning techniques, naive bayes, maximum entropy,             [2]. They report an accuracy of 89% when distinguishing
and support vector machine. Anand et al. [1] describe the              between 1-star and 5-star consumer reviews, using only lexical
development of a corpus of blog posts that are annotated               features.
for the presence of attempts to persuade and correspond-
ing tactics employed in persuasive messages. The authors               2 PROBLEM FORMULATION
make use of lexical features like unigrams, topic features             2.1 Preliminaries
from LDA, and List count features from the Linguistic In-
quiry and Word Count [7], and also the tactics themselves,             Persuasion is an attempt to influence a person’s beliefs, atti-
which are provided by human annotators. Tactics represent              tudes, intentions, motivations, or behaviors. In persuasion,
the type of persuasion being employed: social generalization,          one party (the ‘persuader’) induces a particular kind of men-
threat/promise, moral appeal etc. Carlo et al. [18] analyze            tal state in another party (the ‘persuadee’), like flattery
political speeches and use it in a machine learning framework          or threats, but unlike expressions of sentiment, persuasion
to classify the transcripts of the political discourses, accord-       also involves the potential change in the mental state of the
ing to their persuasive power, and predicting the sentences            other party. Contemporary psychology and communication
that trigger applause in the audience. In [19], Tan et al. look        science further require the persuader to be acting intention-
at the wordings of a tweet to determine its popularity, as             ally. Correspondingly, any instance of (successful) persuasion
opposed to the general notion of author/topic popularity.              is composed of two events: (a) an attempt by the persuader,
The computational methods they propose perform better                  which we term the persuasive act, and (b) subsequent uptake
than an average human. In [12], Lukin et al. determine the             by the persuadee. In this work, we consider (a) only, the
effectiveness of a persuasive argument based on the audience            different persuasive acts, and how to detect them. Working
reaction. They report a set of experiments testing at large            with (b) is a whole other problem. Throughout the rest of
scale how audience variables interact with argument style to           the paper, when we say persuasive arguments, we mean the
affect the persuasiveness of an argument.                               former without taking the effectiveness of the persuasion into
   In addition to text, there has been some work on persua-            account. We are only interested in whether the arguments
sion in the multimedia domain. In [17], Siddiquie et al. work          contain persuasion, and if so, the type.
on the task of automatically classifying politically persuasive
videos, and propose a multi-modal approach for the task.
                                                                       2.2    Problem Statement
They extract audio, visual and textual features that attempt           The main objective of our work is to detect whether a given
to capture affect and semantics in the audio-visual content             piece of text contains persuasion or not. If it does, then we
and sentiment in the viewers’ comments. They work on each              can look into the type of persuasion strategy being used, such
of these modalities separately and show that combining all             as threat/promise, outcome, reciprocity etc. In this paper we
of them works best. For the experiments, they use Rallying a           look at 14 different persuasion strategies. These are listed in
Crowd (RAC) dataset, which consists of over 230 videos from            Table 1. These are the common tactics for persuasive acts
YouTube, comprising over 27 hours of content. Chatterjee et            contributed by Marwell and Scmitt [13], Cialdini [5], as well
al. [4] aim to detect persuasiveness in videos by analysis of          as argumentative patterns inspired by Walton et al. [21]. The
the speaker’s verbal behavior, specifically based on his lexical       intuition behind this investigation along with some examples
usage and paraverbal markers of hesitation (a speaker’s stut-          was discussed in section 1.
tering or breaking his/her speech with filled pauses, such as             It has to be noted that an entire text is deemed to contain
um and uh. Paraverbal markers of hesitation have been found            persuasion, if it includes a few arguments that use some
to influence how other people perceive the speaker’s persua-           of these tactics to persuade. So, it is important to extract
siveness. The analysis is performed on a multimedia corpus of          such arguments from the text, before applying the persuasion
1000 movie review videos annotated for persuasiveness. Park            model to them. So, our approach has two steps: 1) a very
et al. collected and annotated a corpus of movie review videos         simple argument extractor model, to extract arguments from
in [15]. From this data, they demonstrate that the verbal              a given piece of text, and 2) the output of the extractor is
and non-verbal behavior of the presenter is predictive of how          fed into the persuasion detection model, which classifies the
persuasive they are as well as predictive of the cooperative           arguments into the different tactic classes. This process is
nature of a dyadic interaction.                                        represented in the flowchart shown in Figure 1. It has to
   Tasks similar to persuasion detection have been explored,           be noted that, in this work, we are not concerned with the
such as sentiment detection and perspective detection. Lin             effectiveness of the persuasion.
et al. investigated the idea of perspective identification at
the sentence and document level [11]. Using the articles from          3     DATASETS
the bitterlemons website1 , they were able to discriminate             We have used datasets from different domains for the exper-
                                                                       iments to test the robustness of our model. Below are the
1
    http://www.bitterlemons.org                                        datasets used for persuasion detection:
                                                                   2




    18th Workshop on Computational Models of Natural Argument                                                                     55
    Floris Bex, Floriana Grasso, Nancy Green (eds)
    16th July 2017, London, UK
                                            Figure 1: The outline of the problem considered


                                                                Broad Categories
 Outcomes                         Generalizations              External                       Interpersonal                 Other
 Outcome. Mentions some           Good/Bad Traits. As-         VIP. Appeals to author-        Favors/Debts. Mentions        Recharacterization.
 particular   consequences        sociates the intended men-   ity (bosses, experts, trend-   returning a favor or injury   Reframes an issue         by
 from uptake or failure to        tal state with a “good” or   setters)                                                     analogy or metaphor.
 uptake                           “bad” person’s traits.
 Social Esteem. States            Deontic/Moral        Ap-     Popularity. Invokes pop-       Consistency. Mentions         Reasoning. Provides a
 that people the persuadee        peal. Mentions      duties   ular opinion as support for    keeping promises or com-      justification for an argu-
 values will think more           or   obligations,   moral    uptake                         mitments                      mentative point based upon
 highly of them                                                                                                             additional argumentation
                                  goodness, badness
                                                                                                                            schemes e.g., causal reason-
                                                                                                                            ing, arguments from absur-
                                                                                                                            dity
 Threat/Promise. Poses                                                                        Empathy. Attempts to
 a direct threat or promise                                                                   make the persuadee connect
 to the persuadee                                                                             with someone else’s emo-
                                                                                              tional perspective
 Self-Feeling. States that                                                                    Scarcity. Mentions rarity,
 uptake will result in a bet-                                                                 urgency, or opportunity of
 ter self-valuation by the per-                                                               some outcome
 suadee
Table 1: List of Persuasion Tactics Used, Taken from [1]. We do not consider the broad categories in our experiments,
and only work with the finer categories.



   1. ChangeMyView, an active community on Reddit, pro-                       each of these tiles are annotated with a persuasive tactic (if
vides a platform where users present their own opinions and                   present).
reasoning, invite others to contest them, and acknowledge                        4. Political Speeches: We collected a number of speeches
when the ensuing discussions change their original views. The                 of Donald Trump and Hillary Clinton, to analyze the kinds
training data period is: 2013/01/01 - 2015/05/07, and the                     of persuasive tactics they use.
test data period is: 2015/05/08 - 2015/09/01. The training                       To train the argumentation extraction model, we use an
dataset contains 3456 posts and the holdout dataset contains                  Argumentation Essay Dataset2 : This consists of about
807 posts. The dataset is organized as follows: each post                     402 essays. There are two files for each essay - the original
that is written by a user who wants his views changed, has                    essay and the annotated file. Annotations include breaking
two argumentative threads – one that is successful and one                    down the essay into different components: claim, premise,
that is not. This has been used to determine the persuasion                   stance. This can be used to train a simple classifier, to identify
strategies employed by the successful thread. This dataset is                 arguments from text passages.
used in [20].                                                                    The Blog Authorship Corpus is already annotated, as
   2. Supreme Court Dialogs Corpus: This corpus con-                          noted above. In order to have the ground truth, i.e. the
tains a collection of conversations from the U.S. Supreme                     annotations, for the other datasets, we needed to annotate
Court Oral Arguments. 1) 51,498 utterances making up                          the arguments of the corpora with the persuasion tactics
50,389 conversational exchanges, 2) from 204 cases involving                  mentioned in Table 1. For this, we used Amazon Mechanical
11 Justices and 311 other participants 3) metadata like case-                 Turk3 . Using the argument extraction model, we extracted
outcome, vote of the Justice, gender annotation etc. This                     arguments from all of the corpora combined (excluding the
dataset is used in [6].                                                       blog authorship corpus)4 . We had each argument annotated
   3. Blog Authorship Corpora: This dataset, contributed                      by two different turkers, and the turkers were given the
by Pranav Anand et al. and used in [1], is a subset of the                    freedom to classify a piece of text as either a non-argument
blog authorship corpus. Each directory corresponds to a blog.                 or as one of the tactics from Table 1. There was about
Each blog has sub-directories corresponding to a day. Inside                  65% inter-annotator agreement, between the turkers and the
the day sub-directory, there may be multiple posts; posts are
identified with an underscore followed by a number. Out of                    2
                                                                                https://www.ukp.tu-darmstadt.de/data/argumentation-mining/
around 25048 posts, only around 457 were annotated with                       argument-annotated-essays-version-2/
                                                                              3
                                                                                https://www.mturk.com/mturk/
persuasive acts. Each blog post has been broken down into                     4
                                                                                The whole dataset, along with the annotation guidelines, classification
different “text tiles”, which are a few sentences long, and                    criteria and the prototype strings (both median and synthetic) for all
                                                                              the persuasion tactics, can be found at https://github.com/rrahul15/
                                                                              Persuasion-Dataset
                                                                          3




     56                                                                       18th Workshop on Computational Models of Natural Argument
                                                                                          Floris Bex, Floriana Grasso, Nancy Green (eds)
                                                                                                              16th July 2017, London, UK
conflicts were resolved manually. After this, we had a total                 (3) We also compare our approach with that proposed by
of 1457 persuasive arguments from all the datasets combined.                     [1]. Here, the authors make use of different features
The distribution of arguments from the different datasets is                      to account for fewer word-dependent features a) 25
given in Table 2. The guidelines for annotation were built on                    topic features, which were extracted using Latent
the ones provided in [1], with some changes4 .                                   Dirichlet Allocation (LDA) [3], with a symmetric
                                                                                 Dirichlet prior b) 14 Tactic count features, i.e., a
                                                                                 vector consisting of the count of the tactics. Naive
             Dataset              # Arguments                                    Bayes was used for the classification, to assess the
                                                                                 degree to which these feature sets complement each
             ChangeMyView               362
                                                                                 other.
             Supreme Court              440
             Political Speeches         198
             Blog                       457                            4.2     Proposed Approach
Table 2: Distribution of arguments from the different                   In this section, we describe the proposed unsupervised, domain-
datasets                                                               independent approach to identify the persuasion tactics in a
                                                                       given set of arguments. By domain-independence, we mean
                                                                       that our proposed model is robust across different genres, be
                                                                       it political speeches or blogs, and this is important because
                                                                       different domains might have their own vocabulary. Before
4     TECHNICAL APPROACH
                                                                       heading into the details of the algorithm, we present a few
In this section, we describe our proposed models, along with a         useful definitions.
couple of baselines for comparison. We describe the baselines
in the following section. It has to be kept in mind, as noted in         4.2.1 Preliminaries. In this subsection, we describe a few
section 3, that the dataset used consists solely of persuasive         preliminary concepts
arguments.
                                                                       Parse Tree: A parse tree is an ordered, rooted tree that
4.1    Baselines                                                       represents the syntactic structure of a sentence, according to
Here, we discuss the different baselines that we use for com-           some context-free grammar. It captures the sentence struc-
parison. We describe a simple supervised approach that makes           ture: multiple types of sentences can have a similar sentence
use of lexical features and then move onto more complicated            structure, even if their vocabularies are not the same. This
models involving vector-embedding. In all the supervised               is the essence of the approach.
approaches, we use a 80 : 20 split for training and testing.
    (1) Simple Supervised: Here, the learning phase in-                Edit Distance: Edit distance is a way of quantifying how
         volved extracting simple textual features from the            dissimilar two strings are to one another by counting the
         training set: unigrams, bigrams, without punctu-              minimum number of operations required to transform one
         ation, and then training an SVM (Support Vector               string into the other. Given two strings a, and b on an alpha-
         Machine) model, using Sequential Minimal Optimiza-            bet ⌃, the edit distance d(a, b) is the minimum number of
         tion (SMO) [16], to learn a model from these features         edit operations that transforms a into b. The different edit
         that could be applied to the holdout set. This model          operations are: 1) Insertion of a single symbol, 2) Deletion
         was then used to test the remaining posts.                    of a single symbol, 3) Substitution of a single symbol, for
    (2) Supervised Document Vectors: This method uses                  another.
         the Doc2Vec model proposed by Quoc and Mikolov
         [10]. First the arguments were separated into differ-          Median String: The median string of a set of strings is
         ent categories based on the persuasion tactic. Then,          defined to be that element in the set which has the smallest
         the Doc2Vec model was applied, to each such cluster,          sum of distances from all the other elements [9]. In our case,
         to embed all the arguments into vectors. The proto-           the distance between strings is the edit distance.
         type vector for each category was then chosen as the
         mean of all the vectors in that category. To classify            4.2.2 Parse-Tree Model. We are proposing a domain-independent
         the holdout set, one would compute the vector of              classification. The idea is that persuasive arguments may have
         the argument in consideration and then compute                certain characteristic sentence structures, which we might
         the similarity (cosine) to the prototype vectors. The         be able to exploit. The training and testing phase are given
         category which has the highest similarity is the one          below:
         that is chosen.
            The cosine similarity between two vectors a and              Training Phase
         b is defined as follows:                                            (1) As mentioned earlier, we have 14 different categories
                                     a·b                                         for persuasive tactics. We obtain one representative
                      similarity =                          (1)
                                   kakkbk                                        prototype argument for each category. We obtain
                                                                   4




 18th Workshop on Computational Models of Natural Argument                                                                       57
 Floris Bex, Floriana Grasso, Nancy Green (eds)
 16th July 2017, London, UK
       these in two different ways, which we discuss after                     performance across successive trials stabilizes as we
       detailing the algorithm.                                               increase the set size. The performance is best when
   (2) We then perform phrase-structure parsing on each                       we consider all the arguments, and is quite stable
       of these prototype arguments to obtain their parse-                    and close to the best when we consider 30% of all
       trees, which gives the structure of the argument.                      the arguments. So, we settle for 30% as the ideal set
   (3) These parse trees are then converted into parse                        size because we get a stable performance with a very
       strings, keeping the structure intact, and the leaf                    small loss in accuracy and much fewer arguments.
       nodes (the terminal symbols, namely words) are re-                     Examples of the prototype parse strings, using the
       moved, to get a domain independent representation                      median method, for two different persuasion tactics:
       of the structure of the argument.                                      Reasoning and Scarcity, are given in Figure 7. We
   (4) By now, we have representative prototype parse                         only display prototypes for two of the tactics for
       strings for each persuasive tactic, i.e. 14 different pro-              purposes of brevity4 .
       totype parse strings. We use these strings to classify                     Now, although different arguments in the same
       a new argument into one of the persuasive categories.                  category are structurally similar, they may each have
       As mentioned earlier, every instance in the dataset                    certain parts in their structure that capture the
       is a persuasive argument. This can be construed as                     essence of that category much better. We rule out
       the “training phase”.                                                  taking advantage of these individual segments, when
  Testing Phase                                                               we pick one median argument out of the set. This
                                                                              led us to the second method of obtaining prototype
   (1) For the testing phase, we have to classify a new
                                                                              strings.
       argument into one of the categories. Since, each new
                                                                          (2) Synthetic Prototype: We noted earlier that there
       argument in the dataset is persuasive, we don’t have
                                                                              could be certain segments in different arguments of
       to worry about non-arguments. We build a model to
                                                                              the same category, that capture the essence of the
       account for non-arguments in section 6.
                                                                              category better. To accommodate this, we chop up
   (2) Given a new argument, compute its parse string,
                                                                              the different arguments in a set into a number of
       similar to the procedure used in obtaining the parse
                                                                              segments and choose different segments to synthesize
       strings for the prototype arguments.
                                                                              an artificial prototype string. To obtain the best
   (3) To then classify this argument, we compute the nor-
                                                                              ith segment for the synthetic string, we choose the
       malized edit distances (normalized by the lengths of
                                                                              median of the ith segments for all strings in the
       the strings) between its parse string and the proto-
                                                                              set. It has to be noted that we chop the strings
       type parse strings of each category.
                                                                              uniformly. This process of synthesizing the prototype
   (4) The persuasive category with the least edit distance
                                                                              string is illustrated in Figures 4 and 5. As before, we
       is logically the most structurally similar to the given
                                                                              need some parameters to tune here. In addition to
       argument, and hence the argument is classified into
                                                                              the optimal set size, we also need to determine the
       that category.
                                                                              optimal number of segments.
   (5) This process is explained in the flowchart, given in
                                                                                  In order to determine the optimal number of seg-
       Figure 2.
                                                                              ments and set size, we conduct additional experi-
  Choosing the Prototype Strings We propose two meth-                         ments on the supreme court dataset, with different
ods to obtain the prototype argument strings.                                 parameter values to observe the trend as before. We
   (1) Median as the Prototype: Take a set of arguments                       choose different set sizes: 2%, 5%, 10%, 20%, 30%
       from each persuasion category and obtain the proto-                    and All, as before, and different number of segments:
       type string for that category as the median string of                  2, 3, 5, 7, and 9. For each (set size, number of seg-
       the set. We now have to determine the ideal size of                    ment) pair, we conduct 5 trails, choosing a random
       the set in question. For obvious reasons, we get the                   sample for the sets each time, and compute the av-
       best representation if we consider all arguments of                    erage performance. This trend is shown in Figure 6.
       that category, but this would require a completely                     We do not show the performance for different trials
       annotated dataset (making the model supervised).                       as before, rather just the average across the trials.
          In order to determine the ideal set size, we con-                   We see that the trend stabilizes as we increase the
       duct additional experiments on a particular dataset,                   set size, as before, and the accuracy improves as
       the supreme court dataset, with different parameter                     we consider more number of segments. But here, it
       values to observe the trend of the performance. We                     is a tradeoff between accuracy and speed because
       choose different set sizes: 2%, 5%, 10%, 20%, 30% and                   having a large number of segments will require us to
       All (the set sizes chosen is a percentage of the total                 compute the median for every segment. We settle for
       number of arguments in that category). For each set                    30% as the ideal set size and 9 as the ideal number of
       size, we conduct 5 different trials, choosing a random                  segments. The prototype strings using this method
       sample each time, to see the average performance.
       This trend is shown in Figure 3. As we can see, the
                                                                   5




   58                                                                  18th Workshop on Computational Models of Natural Argument
                                                                                   Floris Bex, Floriana Grasso, Nancy Green (eds)
                                                                                                       16th July 2017, London, UK
           are not parse strings of meaningful sentences and so       5.1      Evaluation Metrics
           we do not display them here.                               The metrics used for evaluation are listed below.
                                                                            (1) Precision: The percentage of arguments the system
                                                                                identified as having a particular tactic that in fact
                                                                                had that tactic
                                                                        Precisiont = |{retrieved documents}|  \ |{relevant documents}|
                                                                                                  |{retrieved documents}|
                                                                                                                                      (2)
                                                                             where P recisiont is the precision for tactic t.
                                                                         (2) Recall: the percentage of arguments of a particular
                                                                             tactic that the system classified into that category
                                                                        Recallt = |{retrieved documents}|   \ |{relevant documents}|
                                                                                                |{relevant documents}|
                                                                                                                             (3)
                                                                                where Recall is the recall for tactic t.
                                                                            (3) F1 -measure: the harmonic mean of precision and
                                                                                recall
Figure 3: Trend Graph for Median as the Prototype string
                                                                                              2 ⇥ precisiont ⇥ recallt
                                                                                       F 1t =                                (4)
                                                                                                precisiont + recallt
                                                                                where F1t is the F1 measure for tactic t.
                                                                        It is important to note that, the precision, recall and F1
                                                                      measure are computed for each persuasion tactic separately,
                                                                      akin to a binary classifier. We report the mean of these
                                                                      measures, over all the tactics, in our experiments.

                                                                      5.2      Results
                                                                      First, we run the proposed parse-tree model on the arguments
     Figure 5: Synthesis of the prototype string. The                 extracted from the datasets and obtain the average per-
     coloured segments are the medians of those string                category accuracy. The per-category accuracy is defined as the
     segments. In this case, the median of each segment               percentage of accurate classifications for a specific category.
     is coming from a different string. This is just for illus-        The categories are the persuasion tactics in our case. For
     trative purposes, it need not always be the case.                this task, we combined the arguments from each of the 4
                                                                      datasets to form a combined set, in order to get an average
                                                                      performance estimate (refer to Table 2 for the distribution
                                                                      of arguments in each dataset). We classified the arguments
                                                                      in the combined set and calculated the fraction of correct
                                                                      classifications for each category. The results are given in Table
                                                                      3. We do not consider the broad categories in our experiments,
                                                                      and only work with the finer categories.
                                                                         We also compute the distribution of the tactics in the
                                                                      different datasets. We do this by classifying the arguments
                                                                      in each of the 4 datasets, and calculating the frequency of
                                                                      appearance of each tactic in the corpus as a percentage over
                                                                      all the arguments in that corpus. These are listed in Tables
                                                                      4-7. The ranking of the tactics in these tables, with respect
                                                                      to the percentages, aligns closely with manual evaluations.
                                                                      These distributions are shown just to give an idea of the
  Figure 6: Trend Graph for Synthetic Prototype string                ranking of the tactics, as predicted by the algorithm (which
                                                                      makes sense intuitively).
                                                                         ChangeMyView: Each user posts his/her stance on a
                                                                      particular topic and challenges others to change his opinion.
 5      EXPERIMENTAL RESULTS                                          For example, one of the posts was about a man who did not
 In this section, we present the results of our proposed model        believe in essential-oils and believed that they were destruc-
 and compare it with the different approaches described earlier.       tive, whereas his wife believed the oils were beneficial. He
 In the following section, we discuss the metrics that we have        requested the other users to make him change his mind about
 used for evaluation.                                                 essential oils by giving him sufficient evidence. If a person is
                                                                  6




  18th Workshop on Computational Models of Natural Argument                                                                          59
  Floris Bex, Floriana Grasso, Nancy Green (eds)
  16th July 2017, London, UK
                                        Figure 2: The parse-tree model explained




                                Figure 4: Outline for Synthesis of the Prototype strings




 (a) Parse Tree for Scarcity
 Sentence: Their relationship is not something you see             (b) Parse Tree for Reasoning
 everyday                                                          Sentence: I’m angry because of this, I did NOTHING
 Parse      String:     (NP+SBAR+S (S (NP (PRP$) (NN)) (VP         Parse String: (SBAR+S (NP (PRP)) (VP (VBP) (VB) (SBAR
 (VBZ) (RB) (NP (NN)))) (S (NP (PRP)) (VP (VB) (NP (DT)            (IN) (S (PP (IN) (NP (DT))) (,) (NP (PRP)) (VP (VBD)
 (NN)))))                                                          (ADJP (JJ)))))))

Figure 7: Examples of parse trees with the parse-strings for 2 persuasion tactics: Scarcity and Reason. This is just an
illustration to show how the parse strings look like.


                                                                    and one that is unsuccessful. We analyzed the persuasive
            Category                Accuracy                        strategies that are used by the successful threads because
            Reasoning                  79.8                         these would be examples of good uses of the different persua-
            Deontic/Moral Appeal       69.6                         sion tactics. For our purposes of classifying tactics, we could
            Outcome                    65.7                         have also used the unsuccessful threads (we are not concerned
            Empathy                    61.3                         about the uptake of the persuasion by the persuadee) but
            Threat/Promise             58.2                         we chose not to. Firstly, we extracted the positive comments
            Popularity                 56.4                         from the threads (those which were given a delta by the OP).
            Recharacterization         54.9                         We then applied the parse-tree persuasion model that we
            VIP                        53.5                         developed earlier, to these texts, to perform the classification.
            Social Esteem              50.3                         Many of the comments had links to other credible sources
            Consistency                45.6                         which listed facts that were opposed to the OP’s view. We did
            Favors/Debts               41.1                         not venture into these links. After determining the persuasion
            Self-Feeling               37.7                         strategies used in the comments, we observed that Reasoning
            Good/Bad Traits            35.5                         and Outcomes were the most frequently used strategies. A
            Scarcity                   29.6                         more detailed distribution of tactics is given in Table 4.
Table 3: Per-category Accuracy for the parse-tree model,
when 14 categories are used
                                                                                   Tactic              Percentage
                                                                                   Reasoning               40.7
                                                                                   Outcomes                41.2
successful in changing the mind of the OP (Original Poster),                       Good/Bad traits         10.0
the OP gives that person a delta in their comments. All the                        Social                   8.1
conversations are monitored by Reddit and hence the quality
is high.                                                            Table 4: Distribution of Persuasion Tactics used in the
                                                                    ChangeMyView dataset.
   In our dataset, for each post, there are two threads of
comments – one successful in changing the mind of the OP
                                                               7




    60                                                              18th Workshop on Computational Models of Natural Argument
                                                                                Floris Bex, Floriana Grasso, Nancy Green (eds)
                                                                                                    16th July 2017, London, UK
   Supreme Court Dataset: This dataset includes the tran-
script of the conversation exchanges over 204 cases, along                            Tactic           Percentage
with the outcome of the cases. The outcome could either be                            Empathy               35.2
Respondent or Petitioner. Petitioner is the person who files                          Consistency           33.8
the petition/case against a particular party requesting action                        Favors/Debts          18.2
on a certain matter, and the respondent is the person against                         Social                12.8
whom the said relief is sought. We have collected all the cases
                                                                      Table 7: Distribution of Persuasion Tactics used in
where the petitioner has won and analyzed the argument
                                                                      Hillary’s Speeches.
structure.


              Tactic               Percentage
              Deontic Appeal              33.3                        performances are measured by the precision (P), recall (R)
              Reasoning                   35.5                        and the F1 measure (F), as described earlier.
              Recharacterization          12.6                           As can be seen, the domain-independent parse-tree model
              Outcome                     8.6                         with synthetic prototype strings performs the best, almost
              Empathy                     5.2                         7 8% better than Doc2Vec. Thus, our intuition that different
              VIP                         4.8                         segments of arguments in the same category capture the
Table 5: Distribution of Persuasion Tactics used in the               essence of the category better than others, is validated. It
Supreme Court dataset.
                                                                      has to be noted that in a multi-class classification setting,
                                                                      the F1 scores, obtained in Table 8, are reasonable. It also has
                                                                      to be noted that our model performs faster than Doc2Vec,
                                                                      by a factor of almost 1.5.
   We have taken these cases and analyzed the arguments.
Using the argumentation-model, we were able to identify the           5.3    Sensitivity Analysis
key arguments and then using the parse-tree model, we were
                                                                      We also performed a sensitivity analysis on the parse-tree
able to classify the type of argument that was used. It was
                                                                      model to observe its robustness. For this, we combined the
found that most of the presented arguments were Deontic
                                                                      arguments from all the 4 datasets and ran the model on
Appeal and Reasoning. The distribution of arguments is given
                                                                      the combined set: 1) using only 10 instances in the dataset,
in Table 5.
                                                                      2) using only 100 instances in the dataset, 3) using 1000
   Political Speeches: We analyze the persuasive tactics
                                                                      instances in the dataset, and 4) using all the instances in the
present in the speeches of political candidates, specifically
                                                                      dataset. The prototype argument strings, for this dataset,
those of Donald Trump and Hillary Clinton. These distribu-
                                                                      were synthesized according to the method mentioned earlier.
tions are given in Tables 6 and 7. It was observed that the
                                                                      The results are given in Table 10. For the cases which did
most frequently used tactic by Trump was Outcome (“Make
                                                                      not involve the whole dataset, we randomly sampled 5 times
America Great Again”), while for Hillary, the most frequently
                                                                      from the whole corpus and averaged the results. As we can
used tactic is Empathy
                                                                      see, the results show that the proposed model is relatively
                                                                      robust and invariant with the amount of data.

                  Tactic       Percentage                                Another aspect of sensitivity analysis, involving variation
                                                                      of the parameters of the models was discussed earlier in
                  Outcome          39.1                               section 4.2.2.
                  Principles       31.2
                  VIP              18.5
                  Reasoning        11.2
                                                                                                       Metrics
Table 6: Distribution of Persuasion Tactics used in
                                                                                       Data      P       R         F
Trump’s Speeches.
                                                                                       10      0.370    0.375    0.372
                                                                                       100     0.402    0.393    0.397
                                                                                       1000    0.491    0.387    0.432
   Finally, we present the results of the performance of the
                                                                                       ALL     0.442    0.412    0.426
different algorithms, described earlier, in Table 8. We also
performed these experiments in a binary setting: whether              Table 10: Sensitivity Analysis of the parse-tree model
a given argument contains persuasion or not. These results            using 10, 100, 1000 and all instances of data.
are presented in Table 9. We have run these experiments on
the arguments extracted from each dataset (refer to Table
2 for the distribution of arguments in each dataset). The
                                                                  8




 18th Workshop on Computational Models of Natural Argument                                                                       61
 Floris Bex, Floriana Grasso, Nancy Green (eds)
 16th July 2017, London, UK
6      APPLICATIONS
The persuasion-detection model that is proposed here, is
very versatile and can be applied in many scenarios. A few
applications are detailed below

Basic Argument vs. Non-argument Classifier
The parse-tree model proposed can be used as an argument
classifier. To test its applicability here, we collected a set
of simple sentences5 with less than 10 words. We collected
a total of 1000 such simple sentences. It was our intuition
that such simple phrases should have a very low similarity
with the different persuasion categories (as they are struc-
turally different). So, we needed to establish a threshold to
                                                                      Figure 8: Graph of Threshold vs. Accuracy (F1 Score)
classify a particular piece of text as an argument versus a
non-argument. If the normalized edit distance similarities
between the given string and the prototypes of the different           Political Speech Analysis
categories are less than that threshold (all of them should be        The parse-tree model can also be used to detect spam cam-
less than the threshold), classify it as a non-argument. Else,        paigns on social media, and to detect terrorist campaigns.
classify it into its correct persuasion category. Choosing this       We analyzed some speeches of Osama Bin Laden to see what
threshold is not such an easy task for these reasons:                 kind of tactics he used to influence the people in his speeches.
                                                                      From the analysis, mostly Empathy was used. The detailed
1) Higher threshold: Lower chance of classifying a non-argument       distribution is given in Table 11.
as an argument and Higher chance of classifying an argument
as a non-argument
2) Lower Threshold: Higher chance of classifying a non-                             Tactic               Percentage
argument as an argument and Lower chance of classifying an                          Empathy                   29.7
argument as a non-argument                                                          Recharacterization        20.3
                                                                                    Reasoning                 12.9
   So, we needed to choose a threshold that is neither too                          Good/Bad Traits           20.2
high nor too low. We tried different thresholds, and we show                         Scarcity                  8.0
a graph of threshold vs. accuracy in Figure 8. The accuracy                         Outcome                   8.9
here is the mean F1 score. As we can see, the threshold of            Table 11: Distribution of Persuasion Tactics used in
0.1 seems to work best. For this set threshold, the F1 score is       Osama Bin Laden’s Speeches.
observed to be 0.412. It can be seen that the performance of
this system is not as good as with just persuasive arguments.
It has to be noted here that this is not a binary problem:
argument vs. non-argument. The F1 score presented here is
                                                                      7    DISCUSSION
for the problem of 14 persuasion tactic categories vs. non-
arguments.                                                            We have proposed a fairly simple, domain-independent un-
   This model cannot be used, in this form, as a robust               supervised model for detecting types of persuasion used in
argument classifier yet because some non-arguments have               text. This model can be used in any context/domain because
structures similar to some of the persuasion tactics described        it only uses the inherent structure of the persuasion tactics.
earlier. For example, consider the sentence: “The men smoked          This versatility gives it a variety of applications. Almost all
and most of the women knitted while they talked”. Although            persuasive arguments can be classified under the categories
this is a non-argument, the model could confuse it with one           mentioned earlier. It has to be noted that the reason we
of the persuasion categories like Reason/Promise. It is for           did not include lexical features to our model is because that
this reason that we consider very simple, straight-forward            would make the model slightly domain-dependent. It is for
non-argument sentences of a few words. In order to build              this reason, that we just focused on the structural aspects. Of
a classifier with such capabilities, we would be required to          course, as we mention in section 7.1, it might be possible to
incorporate domain-independent lexical features to the parse          include a few lexical terms like because, if, while etc., which
tree model (more discussion in section 7.1).                          are domain-independent and tactic-dependent, to further
                                                                      strengthen our model.
                                                                         From the obtained results, we see that our model’s accuracy
                                                                      is highest for the following persuasion tactics (refer to Table
                                                                      3): reasoning, deontic/moral appeal, outcome, empathy. This
                                                                      is in agreement with the observations made by the authors
5
    http://www.cs.pomona.edu/~dkauchak/simplification/                in [1]. This further validates our model.
                                                                  9




       62                                                             18th Workshop on Computational Models of Natural Argument
                                                                                  Floris Bex, Floriana Grasso, Nancy Green (eds)
                                                                                                      16th July 2017, London, UK
   From the metrics computed, we see that the performance                      Believe Me-We Can Do This! Annotating Persuasive Acts in Blog
of the proposed model, with synthesized prototype strings,                     Text.. In Computational Models of Natural Argument.
                                                                           [2] Daniel M Bikel and Jeffrey Sorensen. 2007. If we want your
is better than that of vector-embedding models, such as                        opinion. In International Conference on Semantic Computing
Doc2Vec, which uses deep learning. There is almost a 7 8%                      (ICSC 2007). IEEE, 493–500.
                                                                           [3] David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent
improvement in performance (compared to Doc2Vec), and                          dirichlet allocation. Journal of machine Learning research 3,
this over 14 categories in total is a reasonable amount. This is               Jan (2003), 993–1022.
very interesting because: 1) The Doc2Vec model that we have                [4] Moitreya Chatterjee, Sunghyun Park, Han Suk Shim, Kenji Sagae,
                                                                               and Louis-Philippe Morency. 2014. Verbal behaviors and persua-
used as baseline uses all the data, whereas our model just                     siveness in online multimedia content. SocialNLP 2014 (2014),
uses a very small subset of the dataset to compute the initial                 50.
set of 14 prototype argument strings, one for each category,               [5] Robert B Cialdini. 2001. Influence: Science and practice. Boston:
                                                                               Allyn & Bacon (2001).
2) The Doc2Vec model requires a training phase which can                   [6] Cristian Danescu-Niculescu-Mizil, Lillian Lee, Bo Pang, and Jon
take considerable amount of time, considering its complex                      Kleinberg. 2012. Echoes of power: Language effects and power
                                                                               differences in social interaction. In Proceedings of the 21st inter-
structure. In addition, our model runs faster than Doc2Vec                     national conference on World Wide Web. ACM, 699–708.
by a factor of almost 1.5, as already mentioned earlier. The               [7] James W Pennebaker Martha E Francis and Roger J Booth. 1993.
reason that our model beats lexical-features based methods                     Linguistic Inquiry and Word Count. Technical Report. Technical
                                                                               Report, Dallas, TX: Southern Methodist University.
could be attributed to the fact that domain-words somehow                  [8] Henry T Gilbert. 2010. Persuasion detection in conversation.
restrict the performance.                                                      Ph.D. Dissertation. Monterey, California. Naval Postgraduate
   We see that the baseline SVM, on lexical features had                       School.
                                                                           [9] Teuvo Kohonen. 1985. Median strings. Pattern Recognition
considerably high precision but it fell short on recall. That                  Letters 3, 5 (1985), 309–313.
is why we use the F1 measure, as a combination of both                    [10] Quoc V Le and Tomas Mikolov. 2014. Distributed Representations
                                                                               of Sentences and Documents.. In ICML, Vol. 14. 1188–1196.
aspects of the model: precision and recall. The most suitable             [11] Wei-Hao Lin, Theresa Wilson, Janyce Wiebe, and Alexander
model should be the one with a high F1 score, which can be                     Hauptmann. 2006. Which side are you on?: identifying perspec-
achieved with high values for both precision and recall. We                    tives at the document and sentence levels. In Proceedings of the
                                                                               Tenth Conference on Computational Natural Language Learn-
also see that our model performs better, at both binary and                    ing. Association for Computational Linguistics, 109–116.
multi-class classification, than the approach used in [1].                [12] Stephanie Lukin, Pranav Anand, Marilyn Walker, and Steve
   Sensitivity analysis was also done to test the robustness of                Whittaker. 2017. Argument Strength is in the Eye of the Beholder:
                                                                               Audience Effects in Persuasion. (2017).
the method. As can be seen from the results, the method is                [13] Gerald Marwell and David R Schmitt. 1967. Dimensions of
fairly stable and robust with respect to the size of the data.                 compliance-gaining behavior: An empirical analysis. Sociometry
                                                                               (1967), 350–364.
   The main takeaway here is that, complex methods like                   [14] Pedro Ortiz. 2010. Machine learning techniques for persua-
neural networks may not be the best for all tasks. We have                     sion dectection in conversation. Ph.D. Dissertation. Monterey,
shown that with very simple methods like the one we have                       California. Naval Postgraduate School.
                                                                          [15] Sunghyun Park, Han Suk Shim, Moitreya Chatterjee, Kenji Sagae,
proposed, we are able to achieve a performance better than                     and Louis-Philippe Morency. 2014. Computational analysis of per-
the methods discussed in the paper, while avoiding high                        suasiveness in social multimedia: A novel dataset and multimodal
computational costs and the opacity of results of neural                       prediction approach. In Proceedings of the 16th International
                                                                               Conference on Multimodal Interaction. ACM, 50–57.
computation based methods.                                                [16] John Platt. 1998. Sequential minimal optimization: A fast algo-
                                                                               rithm for training support vector machines. (1998).
7.1    Future Work                                                        [17] Behjat Siddiquie, Dave Chisholm, and Ajay Divakaran. 2015.
                                                                               Exploiting Multimodal Affect and Semantics to Identify Politically
There is scope for improving the proposed model further. As                    Persuasive Web Videos. In Proceedings of the 2015 ACM on
                                                                               International Conference on Multimodal Interaction. ACM, 203–
of now, we just use the sentence structures of the different                    210.
persuasion tactics. We have not made use of the fact that                 [18] Carlo Strapparava, Marco Guerini, and Oliviero Stock. 2010.
there could be some domain-independent words for each tac-                     Predicting Persuasiveness in Political Discourses.. In LREC.
                                                                          [19] Chenhao Tan, Lillian Lee, and Bo Pang. 2014. The effect of
tic, like because, if, while etc. Incorporation of such keywords               wording on message propagation: Topic-and author-controlled
into the model could result in improved performance. We will                   natural experiments on Twitter. arXiv preprint arXiv:1405.1438
investigate this in the future. Additionally, we will use our                  (2014).
                                                                          [20] Chenhao Tan, Vlad Niculae, Cristian Danescu-Niculescu-Mizil,
approach for different applications, such as detecting spam                     and Lillian Lee. 2016. Winning arguments: Interaction dynamics
campaigns, measuring how effective a spam campaign can                          and persuasion strategies in good-faith online discussions. In
                                                                               Proceedings of the 25th International Conference on World
be (combination of persuasiveness and connectivity in the                      Wide Web. International World Wide Web Conferences Steering
network, which can be measured by PageRank), identifying                       Committee, 613–624.
terrorist campaigns etc.                                                  [21] Douglas Walton, Christopher Reed, and Fabrizio Macagno. 2008.
                                                                               Argumentation schemes. Cambridge University Press.
                                                                          [22] Joel Young, Craig H Martell, Pranav Anand, Pedro Ortiz,
8     ACKNOWLEDGEMENTS                                                         Henry Tucker Gilbert IV, et al. 2011. A Microtext Corpus for
                                                                               Persuasion Detection in Dialog.. In Analyzing Microtext.
This work has been funded by ARO award #W911NF-13-1-
0416.

REFERENCES
 [1] Pranav Anand, Joseph King, Jordan L Boyd-Graber, Earl Wagner,
     Craig H Martell, Douglas W Oard, and Philip Resnik. 2011.
                                                                     10




 18th Workshop on Computational Models of Natural Argument                                                                                   63
 Floris Bex, Floriana Grasso, Nancy Green (eds)
 16th July 2017, London, UK
                             Blogs              ChangeMyView                 Supreme Court          Political Speeches
 Method                P      R       F        P      R     F            P         R       F        P        R       F
 SVM Baseline        0.594   0.132   0.216   0.511   0.107    0.176   0.605      0.051   0.094    0.454    0.038    0.070
 NB+Tactic           0.361   0.483   0.413   0.309   0.465    0.371   0.319      0.511   0.393    0.267    0.417    0.325
 NB+LDA              0.098   0.132   0.112   0.071   0.116    0.088   0.083      0.151   0.107    0.032    0.045    0.037
 NB+Tactic+LDA       0.114   0.229   0.152   0.099   0.212    0.135   0.138      0.242   0.176    0.041    0.145    0.064
 S-Doc2Vec           0.493   0.439   0.464   0.472   0.427    0.448   0.496      0.442   0.467    0.411    0.392    0.401
 ParseTree           0.498   0.443   0.468   0.491   0.419    0.452   0.477      0.462   0.468    0.418    0.371    0.393
 ParseTree+SP        0.531   0.470   0.498   0.539   0.448    0.489   0.521      0.483   0.501    0.464    0.405    0.432
Table 8: Comparison of results for the different methods, considering all the 14 tactics. Here, ParseTree is the first
model proposed, ParseTree+SP stands for the parse-tree model with the synthetic prototype strings, NB stands for
Naive Bayes, LDA stands for Latent Dirichlet Allocation, and S-Doc2Vec stands for the supervised version of the
Doc2Vec method. NB+Tactic, NB+LDA, NB+Tactic+LDA are the features used by the authors in [1]




                             Blogs              ChangeMyView                 Supreme Court          Political Speeches
 Method                P      R       F        P      R     F            P         R       F        P        R       F
 SVM Baseline        0.741   0.179   0.288   0.721   0.159    0.261   0.763      0.171   0.279    0.651    0.107    0.184
 NB+Tactic           0.537   0.672   0.597   0.515   0.645    0.573   0.533      0.656   0.588    0.467    0.599    0.525
 NB+LDA              0.133   0.285   0.181   0.111   0.262    0.156   0.137      0.282   0.184    0.051    0.203    0.082
 NB+Tactic+LDA       0.169   0.437   0.244   0.144   0.416    0.214   0.154      0.441   0.228    0.097    0.354    0.152
 S-Doc2Vec           0.732   0.599   0.659   0.705   0.589    0.642   0.723      0.627   0.672    0.648    0.533    0.585
 ParseTree           0.701   0.603   0.648   0.692   0.589    0.636   0.716      0.614   0.661    0.633    0.539    0.582
 ParseTree+SP        0.737   0.629   0.679   0.721   0.623    0.668   0.751      0.642   0.692    0.661    0.562    0.607
Table 9: Comparison of results for the different methods in a binary setting. Here, ParseTree is the first model
proposed, ParseTree+SP stands for the parse-tree model with the synthetic prototype strings, NB stands for Naive
Bayes, LDA stands for Latent Dirichlet Allocation, and S-Doc2Vec stands for the supervised version of the Doc2Vec
method. NB+Tactic, NB+LDA, NB+Tactic+LDA are the features used by the authors in [1]




                                                         11




    64                                                        18th Workshop on Computational Models of Natural Argument
                                                                          Floris Bex, Floriana Grasso, Nancy Green (eds)
                                                                                              16th July 2017, London, UK