=Paper=
{{Paper
|id=Vol-2048/paper12
|storemode=property
|title=An Attempt to Combine Features in Classifying Argument Components in Persuasive Essays
|pdfUrl=https://ceur-ws.org/Vol-2048/paper12.pdf
|volume=Vol-2048
|authors=Yunda Desilia,Velizya Thasya Utami,Cecilia Arta,Derwin Suhartono
|dblpUrl=https://dblp.org/rec/conf/icail/DesiliaUAS17
}}
==An Attempt to Combine Features in Classifying Argument Components in Persuasive Essays==
<pdf width="1500px">https://ceur-ws.org/Vol-2048/paper12.pdf</pdf>
<pre>
    An attempt to combine features in classifying argument
              components in persuasive essays
                    Yunda Desilia, Velizya Thasya Utami, Cecilia Arta, Derwin Suhartono
                                                      School of Computer Science
                                                       Bina Nusantara University
                                                          Jakarta, Indonesia
             {yunda.desilia, velizya.utami, cecilia.arta}@binus.ac.id, dsuhartono@binus.edu


ABSTRACT                                                                and helps in teaching critical thinking. Thus, having a better
So far, several approaches have been done in detecting and              accuracy in classifying argument components becomes a
classifying argumentation in persuasive essays. In this paper, we       compulsory problem.
proposed some new features on top of the state-of-the-art               In this work, we proposed some new features on top of the state-of-
researches in argumentation mining. We grouped 68 features into 8       the-art research in argumentation mining. We implemented 68 sub-
categories; they are structural, lexical, indicators, contextual,       features that are grouped into 8 main categories of features. They
syntactic, prompt similarity, word embedding, and discourse             are structural, lexical, contextual, indicators, prompt similarity,
features. Instead of handcrafted features, we utilized word             syntactic, word embedding, and discourse. We also provided
embedding as the feature. At the end of this paper, we presented the    accuracy comparison to previous systems that were related to our
comparison between each group of features to classify the argument      work. We propose approach that consists of two main steps in our
components. 402 persuasive essays were utilized. We found that          research. First, we did component identification, which include a
structural features were the most significant feature while discourse   process of identification and detection of argument component. We
features were not. After combining all features, we obtained            separated argumentative text units from non-argumentative text
79.96% as the accuracy; it was slightly outperforming the state-of-     units and also identified the presence of argument component.
the-art accuracy which was 77.3%.                                       Secondly, we did component classification, which include
                                                                        classification process of argument component type into major
Keywords                                                                claim, claim, premise, or non-argumentative.
argument component, feature, word embedding, argumentation
mining, persuasive essay                                                2. RELATED WORKS
                                                                        There are several works that are related with this research,
1. INTRODUCTION                                                         specifically in the field of argument detection and classification.
Argumentation is a process of building arguments, exchanging            Moens, Boiy, Palau, and Reed (2007) did a research of automatic
arguments, and evaluating arguments in terms of interaction with        detection of arguments in legal texts. They used lexical, syntactic,
the other arguments. An argument is a set of premises or                semantic, and discourse features. In this research, they used
evidence/fact, which are given to support the claim (Palau and          Araucaria corpus as the dataset and Multinomial naïve Bayes and
Moens, 2009). The objective of argumentation is to make the             maximum entropy model as the classifiers. As the result, they
audiences believe the idea, thought, or opinion stated are true and     obtained 74% accuracy of all features extraction in the variant of
proved. Argumentation mining aims to detect the arguments in a          texts and 68% in legal texts. The detection and classification of
text document, relation between them, and internal structure of each    argument component and the identification of argument structure
argument. By integrating argumentation mining in writing                was proposed by Palau and Moens (2009). They used Araucaria
environments, human will be able to inspect their text for              corpus and European Court of Human Rights (ECHR) as the data
plausibility and to improve the quality of their argumentation.         and feature extraction as the method. This research obtained 73%
A minimum definition of an argument is a set of statement that          of accuracy in Araucaria and 80% of accuracy in ECHR. On the
consists of 3 parts: conclusion, premises, and inference (Walton,       other hand, the accuracy was 74.07% for premise and conclusion
2009). On the other hand, it is stated that argument is a statement     classification and it yielded 60% for detecting the argument
with 3 components: claim/point of view that is argued, actual           structure. Lippi and Torroni (2015) proposed several methods to
argument/evidence, a statement that links first claim to the            detect claims. They used IBM corpus dataset and 90 persuasive
argument and makes sure the function of argument can be                 essays. As the result, they achieved 71.4% of accuracy in the 90
understood. (Moens, 2014)                                               persuasive essays and 20.6% in IBM corpus. Al-Khatib et al. (2016)
                                                                        proposed a distant supervision approach in classifying
Palau (2008) stated that argumentation detection can help to            argumentative parts in text automatically from online debate portal.
facilitate understanding of argumentation paragraph, demonstrate a      They used corpus of Webis-debate-16 and did a cross-domain
good identification for important information, increase the             comparison with 90 persuasive essays and web discourse corpus.
possibility of indexing implementation or document searching,           This research achieved 66.8% of accuracy in 90 persuasive essays
represent reasoning system. The classification of argument              corpus, 87.7% of accuracy in web discourse corpus, and 91.8% of
component and visualization has several advantages, such as to          accuracy in Webis-debate-16. For the experiment of cross-domain
show clear, strong, and structured/organized arguments. Besides, it     comparison, the highest accuracy was obtained by web discourse
also facilitates evaluation of opinion, facilitates understanding of    corpus tested in Webis-debate-16, which reached 84.4% of
other’s opinions, helps in giving the teaching of general thoughts,     accuracy.


 18th Workshop on Computational Models of Natural Argument                                                                              71
 Floris Bex, Floriana Grasso, Nancy Green (eds)
 16th July 2017, London, UK
The other focus to classify the arguments by identifying                3. METHODS
argumentation schemes was done by Feng and Hirst (2011). They
used Araucaria database, features extraction, and two methods of        3.1 Data
classification. The features used in this research were general and     We utilized a corpus of persuasive essays compiled by Stab and
scheme-specific features. The highest accuracy was 90.8% in             Gurevych (2016). It consists of 402 annotated persuasive essays
scheme target of reasoning while the lowest accuracy was 63.2% in       with different kind of topics. This corpus contains argument
scheme target of classification for one-against-others-                 component annotation in the clause-level as well as argumentative
classification. For pairwise classification, the highest accuracy was   relations and argument structure in a different level of discourse. It
98.3% in scheme target of classification-reasoning and the lowest       also contains annotation about major claim, claim, premise in each
accuracy was 64.2% in scheme target of classification-                  of essay and consists of 7.116 sentences with 147.271 tokens.
consequences.
                                                                        3.2 Current Features
To identify the argumentative discourse, some researchers did           We implemented 68 sub-features that were categorized into 8
annotation study to create the corpus. Stab and Gurevych (2014a)        groups: structural, lexical, indicators, contextual, syntactic, prompt
did the annotation study and created corpus of 90 persuasive essays.    similarity, word embedding, and discourse features. The features
They continued the research by identifying the argument                 described in this section were combined from some researches in
component and the argumentative relations in persuasive essays.         argument components classification.
Support Vector Machine (SVM) was used and it obtained 77.3% of
accuracy with structural feature as the best performing feature. On     3.2.1 Structural Features
further research, they created an approach to parse the                 Structural features are features that identified argument component
argumentation structures in persuasive essays (Stab and Gurevych,       based on structure of the text. Covering sentence is a sentence that
2016). They created corpus of 402 persuasive essays and extracted       contains the argument component in it. Structural includes 3 sub-
the features to identify the argument component, classified the         features, which are token statistics, location, and punctuation. For
argument component, identified the argumentative relation, tree         token statistics, we defined the number of tokens from argument
generation, and stance recognition. They obtained 77.3% of              component, the number of tokens from covering sentence, the
accuracy and structural was the best performing features. They also     number of tokens preceding and following an argument component
proposed approach to recognize the absence of opposing arguments        in the covering sentence, the token ratio between covering sentence
in persuasive essays. They used both corpus of 90 persuasive essays     and argument component, the number of tokens from covering
and 402 persuasive essays. As the result, they got 75.6% of             paragraph, the number of covering sentences preceding and
accuracy. The combination of unigrams, production rules, and            following paragraph, the token ratio between covering sentence and
adversative transitions obtained the highest accuracy among all of      covering paragraph, the token ratio between covering sentence and
combinations. Habernal and Gurevych (2016) annotated and                essay, the average number of token at sentence, the ratio and a
analyzed the arguments automatically in user-generated web              Boolean feature that indicates if the argument component covers all
discourse by extracting 5 (five) feature sets to detect the argument    tokens of its covering sentence as token statistics features. For
component. As the result, they obtained 75.4% of accuracy.              location, we defined a set of location-based features for exploiting
                                                                        the structural properties of essay. 4 Boolean features that indicate
Some researchers focused on the approach to identify the                if the argument component is present in the introduction or
argumentation structures. Peldszus (2014) proposed an approach to       conclusion of an essay and if it is present in the first or the last
identify argumentation structures in micro text automatically with      sentence of a paragraph. Secondly, we add the position of the
the various level of granularity. They used 115 micro text as the       covering sentence in the essay and the position of the covering
dataset and extracted the features and did a comparison with some       sentence in the paragraph as a numeric feature. We also count the
types of classifiers. The most outperformed classifiers were            ratio of covering sentence and paragraph, the ratio of covering
Support Vector Machine (SVM) and Maximum Entropy Classifiers            sentence and essay, and the ratio of paragraph and essay. For
(MaxEnt). SVM obtained 64% of accuracy and MaxEnt obtained              punctuation, we define a set of punctuation-based feature to
63% of accuracy. The best features to obtain the high accuracy were     identify characteristics of argument component. This features will
lemma unigrams and lemma bigrams. Lawrence and Reed (2015)              return the number of punctuation marks of the covering sentence
proposed 3 (three) methods to extract argumentation structures.         and the number of punctuation marks of the argument component,
They used AIFdb corpus and implemented discourse indicators,            the number of punctuation marks preceding and following an
topic similarity, and schematic structure as the methods. The           argument component in its covering sentence and a Boolean feature
combination of those methods reached 83% of accuracy with the           that indicates whether the sentence is closed with a question mark
best performing feature was schematic feature.                          or not.
Further implementation of argumentation detection and                   3.2.2 Lexical Features
classification, such as accessing the quality of arguments have been
                                                                        These features are defined by N-grams, POS N-grams, verbs,
done by some researchers. Wachsmuth, Al-Khatib, and Stein
                                                                        adverbs, modals auxiliary, comparative and superlative adjective,
(2016) investigated mining structure to access the argumentation
                                                                        the ratio of pronouns, and word couples.
quality of persuasive essays. They used corpus that contains essays
from International Corpus of Learner English, extracted the             3.2.3 Indicator Features
features, and classified the argument component into ADU types:         Boolean features indicating the presence of question indicators,
thesis, conclusion, premise, and none. They obtained 74.5% of           time indicators, evidence indicator, conclusion indicator, compare-
accuracy with the sentence position as the best performing feature.     and-contrast, and cue phrases. We used 55 discourse markers as
                                                                        well and modelled each as a Boolean feature set to true if one of
                                                                        them is present in the covering sentence. The discourse markers
                                                                        were taken from the Penn Discourse Treebank 2.0 Annotation
                                                                        Manual (Prasad et. al., 2007). Furthermore, we also define 4 (four)


     72                                                                 18th Workshop on Computational Models of Natural Argument
                                                                                    Floris Bex, Floriana Grasso, Nancy Green (eds)
                                                                                                        16th July 2017, London, UK
Boolean features that indicate the presence of type indicators           x Prompt similarity feature was the similarity of cosine value
including forward indicators, backward indicators, thesis indicators       between current sentence with the prompt.
and rebuttal indicators. In addition, we defined 5 (five) Boolean        x Word embedding feature was defined to extract the vector
features to identify possessive pronoun (I, me, mine, myself, my)          representation of each word.
in covering sentence.
                                                                         4. RESULTS AND DISCUSSION
3.2.4 Contextual Features
These features return the number of punctuations, number of tokens
                                                                         4.1 Performance
and sub-clauses from the sentence preceding and following the            There are 8 categories of features that were implemented for the
covering sentence, the number of covering sentence preceding and         features extraction: structural, indicator, contextual, lexical,
following the covering sentence. We also defined Boolean features        syntactic, prompt similarity, word embedding, and discourse with
                                                                         total of 68 sub-features. We used Support Vector Machine (SVM)
indicating the presence of modal verbs, question indicator,
comparative and superlative adjective, and type of indicators. In        as classifier by using 10-folds cross validation and utilized a corpus
addition, we defined 4 (four) Boolean features and numeric that          of 402 annotated persuasive essays by Stab and Gurevych (2016).
indicate if the shared noun and shared verb is present in the            The accuracy result of this system was 79.96%. It indicated that a
introduction or conclusion of an essay.                                  higher accuracy was achieved in comparison to the argument
                                                                         component detection and classification systems conducted in the
3.2.5 Syntactic Features                                                 previous works as shown in Table 1. Even though this comparison
We count the number of sub-clauses in each sentence and return           did not show a proper objective evaluation due to task differences
numeric value. We also count the depth of parse tree, extract the        among them, our accuracy was quite promising to surpass previous
production rules, and identify whether the sentence is in past tense,    works, especially to Stab and Gurevych (2014b).
present tense, or not in both.
                                                                                      Table 1. Previous works performance
3.2.6 Prompt Similarity Features
These features were created to count the similarity of cosine value                     Related Work                           Accuracy
between current sentence and the prompt, with the first sentence in        Palau and Moens (2007)                                 74%
each paragraph, with the last sentence in each paragraph, with its         Palau and Moens (2009)                               74.04%
preceding sentence, and with its following sentence.                       Stab and Gurevych (2014b)                             77.3%
                                                                           Lippi and Torroni (2015)                              71.4%
3.2.7 Word Embedding Features                                              Stab and Gurevych (2016)                              77.3%
They were created to count the vector representation of each word.         Wachsmuth, Al-Khatib, and Stein (2016)                74.5%
Glove was used to obtain the vector representation for each word.          Habernal and Gurevych (2016)                          75.4%
We count the average of vector values per argument component.              Al-Khatib et al. (2016)                               66.8%
3.2.8 Discourse Features
We implemented discourse doubles, which return: (1) count of                Table 2. Confusion matrix of the system accuracy results
explicit and implicit relation in a sentence and then return the count           (SVM) for argument component classification
of which type present the most, (2) the ratio of explicit and implicit                        MC        Cl         Pr           No
relation. Explicit discourse connectives are drawn primarily from
                                                                                  MC          578       130        43            0
well-defined syntactic classes, while implicit discourse connectives
                                                                                  Cl          226       309       970            1
are inserted between paragraph-internal adjacent sentence pairs not
                                                                                  Pr          28        147       3656           1
related explicitly by any of the syntactically defined set of explicit
connectives.                                                                      No           0         0         0           1638

3.3 Additional Features                                                  Table 2 explains that the system correctly identifies 578 major
To explore further in classifying argument components, we defined        claims (MC), 309 claims (Cl), 3656 premises (Pr), and 1638 non-
some features which are quite promising to boost the accuracy of         argumentative (No). The errors occurred in identifying claims.
classification. Our additional features included 7 main features,        Most of them were identified as premise. The accuracy in
which were structural, lexical, indicators, contextual, syntactic,       identifying each component was 76.96% for major claim, 20.52%
prompt similarity, and discourse features.                               for claim, 95.41% for premise, and 100% for non-argumentative.
x Structural features were number of token in covering                   We guessed the accuracy to identify claims was very low due to
  paragraph, number of preceding and following covering                  class imbalance where claim had the lowest amount of data. Beside
  sentence in covering paragraph, and position of covering               using 10-folds cross validation for training, we also conducted
  sentence in paragraph.                                                 experiments using 5-folds cross validation with 79.74% accuracy.
x Lexical features were POS N-grams and word couples.                                 Table 3. Previous works performance
x Indicator features were forward, backward, rebuttal, thesis
  indicators, and cue phrases.                                               Feature         Accuracy         Feature Name        Accuracy
                                                                              Name
x Contextual features were type of indicators in context, number
                                                                          Structural          77.83%       Syntactic                  51.35%
  of shared noun and shared verb that are present in introduction
  and conclusion in essay, and 4 binary features that indicates           Indicator           54.73%       Prompt Similarity          54.79%
  shared noun and verbs that are present in introduction or               Contextual          63.10%       Word Embedding             49.46%
  conclusion in essay.                                                    Lexical             61.06%       Discourse                  49.41%
x Syntactic feature was POS distribution.                                 All Features                          79.96%


 18th Workshop on Computational Models of Natural Argument                                                                                 73
 Floris Bex, Floriana Grasso, Nancy Green (eds)
 16th July 2017, London, UK
We conducted experiments by using each feature group to capture            Table 5. Accuracy result of implementation features by Stab
which feature sets were significant in classifying the argument                              and Gurevych (2014b)
components. Based on Table 3, the best feature set to classify
argument components is structural feature with 77.83% accuracy                        Feature Name                      Accuracy
result. Contextual and lexical features consecutively were the next        Structural                                    74.33%
significant features among all.                                            Indicator                                     61.11%
                                                                           Contextual                                    52.38%
4.2 Combining the Features                                                 Lexical                                       58.69%
We attempted to combine all features as the next experiment. It was        Syntactic                                     50.94%
to identify which features combination has the best and the least          All Features                                  76.32%
impact in improving the system’s accuracy.
  Table 4. Accuracy result of combination of feature without              We proposed some handcrafted features to develop algorithm to
               one feature category in system                             identify and classify argument components and to increase the
  Feature Name         Accura        Feature Name          Accuracy       accuracy of system. This experiment used 24 sub-features which
                         cy                                               produced 68.46% as the accuracy result. In addition, we ran the
Without Structural     69.74%       Without Syntactic       78.21%        system using each feature’s category to identify each feature’s
                                                                          performance (Table 6).
Without Lexical        77.72%       Without Prompt          79.93%
                                    Similarity                              Table 6. Accuracy result of proposed handcrafted features
Without                 77.98%      Without    Word         78.48%
Indicators                          Embedding                                         Feature Name                      Accuracy
Without                 78.05%      Without                 79.98%         Structural                                    63.81%
Contextual                          Discourse                              Indicator                                     49.45%
                                                                           Contextual                                    59.94%
                                                                           Lexical                                       49.58%
Based on Table 4, we can conclude that the most influential feature        Prompt Similarity                             54.70%
is structural, because all combination of features without structural      Discourse                                     49.43%
has the lowest accuracy result with 69.74%, while the least                All Features                                  68.46%
influential feature is discourse as without discourse feature, the
accuracy result is 79.98%.
                                                                          From the result presented in Table 6, the system achieved 68.46%
From 8 trials of features combination, 7 of them showed significant       accuracy with the highest accuracy achieved by structural features
accuracy, where 7 of them achieved an accuracy of 77.7% to                which followed by contextual and prompt similarity features as the
79.9%. This result indicates that the accuracy achieved by the            second and the third most performing features.
combination of features produces higher accuracy compared to the
accuracy of previous works (Table 1). In addition, we can see from        The experiments also implemented additional features which were
the experiments that the accuracy of the system significantly             obtained from previous works conducted from state-of-the-art
decreased as a result of the feature extraction without structural        researches. There are 16 additional sub-features implemented in
features. Therefore, we also did an experiment with combination of        this scenario. Based on Table 7, the system achieved 71.08% of
3 (three) features that achieved the highest accuracy, i.e. structural,   accuracy with the most significant accuracy was achieved by
lexical, and contextual features which produced 77.87% as the             structural and contextual features. Word embedding feature was
accuracy result.                                                          less performing feature in this experiment.

4.3 Comparing Each Group of Features                                      Table 7. Accuracy result of additional features from state-of-
We conducted other experiments by comparing system’s accuracy                                  the-art researches
among implementation by using the features presented by Stab and                      Feature Name                      Accuracy
Gurevych (2014b), handcrafted features proposed by authors, and            Structural                                    61.15%
additional features from previous works. The system was trained            Indicator                                     53.95%
using the same corpus consisting 402 annotated persuasive essays           Contextual                                    50.69%
compiled by Stab and Gurevych (2016).
                                                                           Lexical                                       59.27%
Stab and Gurevych (2014b) implemented structural, indicator,               Syntactic                                     50.72%
contextual, lexical, and syntactic features with total of 28 sub-          Prompt Similarity                             54.79%
features. Our system’s accuracy result using features extraction           Word Embedding                                49.46%
based on Stab and Gurevych (2014b) is 76.32% (Table 5), while              All Features                                  71.08%
the original accuracy result of their research was 77.3% by using
90 persuasive essays where the highest accuracy is achieved by
structural features. The result’s difference can be caused by the         5. CONCLUSIONS
different number of the training data.                                    After all the experiments, we have done to detect and classify the
                                                                          argument component, we found that 79.96% of accuracy was
                                                                          achieved by implementing all features set. We defined 68 sub-
                                                                          features which were summarized into 8 categories of features: they
                                                                          were structural, lexical, indicator, contextual, syntactic, word
                                                                          embed-ding, prompt similarity, and discourse features. We found
                                                                          that structural features were the best feature that had the most


     74                                                                   18th Workshop on Computational Models of Natural Argument
                                                                                      Floris Bex, Floriana Grasso, Nancy Green (eds)
                                                                                                          16th July 2017, London, UK
significant impact to the system’s accuracy, which obtained                Proceedings of the 11th International Conference on Artificial
77.83% accuracy. The other significant features are contextual and         Intelligence and Law, ICAIL '07, pages 225-230, Stanford,
lexical, with the accuracy of 63.10% and 61.06%.                           CA, USA.
The most significant features combination was the combination of      [9] Palau. 2008. Automatic argumentation detection. Project
all features without discourse features. This combination obtained        ACILA - Automatic Detection and Classification of
79.98% accuracy, which was higher than the total accuracy of all          Arguments in a Legal Case, Leuven, Belgium.
features. The combination of all features without structural          [10] Palau, R.M. and Moens, M.F. 2009. Argumentation mining:
performed the lowest accuracy, so that we conclude that structural         the detection, classification and structure of arguments in text.
features was the most significant feature while discourse features         In Proceedings of the 12th International Conference on
was not. Besides, the combination of 3 (three) structural,                 Artificial Intelligence and Law, ICAIL'09, pp. 98-107,
contextual, and lexical features also performed a significant              Barcelona, Spain, 2009.
accuracy, which was 77.87%. Features proposed by Stab and
Gurevych (2014b) performed the highest accuracy, which was            [11] Peldszus, A. 2014. Towards segment-based recognition of
76.32%. Each of experiment in comparing features classification            argumentation structure in short texts. Proceedings of the First
could obtain more than 67% of accuracy. It means that each of the          Workshop on Argumentation Mining, pages 88-97,Baltimore,
experiment could identify argument components for more than                Maryland USA, June 26, 2014.
67%.                                                                  [12] Peldszus, A. and Stede, M. 2013. Form argument diagrams to
                                                                           argumentation mining in texts: a survey. International Journal
Since the experiments showed that the most significant features            of Cognitive Informatics and Natural Intelligence Volume 7
were structural, contextual, and lexical, we concern to develop            Issue 1, January 2013 Pages 1-31.
these groups for our next experiment. We also find that the data
training in bigger number with various topics and characteristics     [13] Prasad, R., Miltsakaki, E., Dinesh, N., Lee, A., Joshi, A.,
will probably increase the accuracy of system. Besides, we also            Robaldo, L., and Webber, B.L. 2007. The Penn Discourse
must define the other features or the other method that can help in        Treebank 2.0 annotation manual. Technical report, Institute
differentiate the premise and claim further.                               for Research in Cognitive Science, University of
                                                                           Pennsylvania.
6. ACKNOWLEDGMENTS                                                    [14] Stab, C. and Gurevych, I. 2014a. Annotating argument
This research work was supported by Bina Nusantara University              components and relations in persuasive essays. In:
and partly supported by research grant from Directorate General of         Proceedings of the 25th International Conference on
Research and Development Reinforcement, Ministry of Research,              Computational Linguistics (COLING 2014), pp. 1501-1510,
Technology and Higher Education of the Republic of Indonesia.              Dublin, Ireland, 2014.
7. REFERENCES                                                         [15] Stab, C. and Gurevych, I. 2014b. Identifying argumentative
[1] Khatib, A., Wachsmuth, H., Matthias, Hagen, M., Kohler, J.,            discourse structures in persuasive essays. In: Conference on
    and Stein, B. 2016. Cross-domain mining of argumentative               Empirical Methods in Natural Language Processing (EMNLP
    text through distant supervision. In 15th Conf. Of the North           2014), pp. 46-56, Doha, Qatar, 2014.
    American Chapter of the Association for Computational             [16] Stab, C. and Gurevych, I. 2016. Parsing argumentation
    Linguistics (NAACL'16) (to appear). Association for                    structures in persuasive essays. In: arXiv preprint, under
    Computational Linguistics, San Diego, CA, USA, 2016.                   review, April 2016. Germany: Technische Universität
[2] Feng, V.W. and Graeme H. 2011. Classifying arguments by                Darmstadt.
    scheme. Proceedings of 49th Annual Meeting of the                 [17] Stab, C. and Gurevych, I. 2016. Recognizing the absence of
    Association for Computational Linguistics, Portland, Oregon,           opposing arguments in persuasive essays. In: Proceedings of
    pp. 987-996, 2011                                                      the 3rd Workshop on Argument Mining held in conjunction
[3] Habernal, I. and Gurevych, I. 2015. Exploiting debate portals          with the 2016 Annual Meeting of the Association for
    for semi-supervised argumentation mining in user-generated             Computational Linguistics (ACL 2016), p. 113-118, August
    web discourse. In: Proceedings of the 2015 Conference on               2016
    Empirical Methods in Natural Language Processing (EMNLP           [18] Stab, C. and Habernal, I. 2015. Detecting argument
    2015), pp. 2127-2137, Lisbon, Portugal, 2015.                          components and structures. In: Report of Dagstuhl Seminar on
[4] Habernal.I. and Gurevych, I. 2016. Argumentation mining in             Debating Technologies (15512), Vol. 5, p. 32-32, 2016.
    user-generated web discourse. Computational Linguistics, in       [19] Toulmin, S. E. 1958. The Uses of Argument. Cambridge
    press.                                                                 University Press.
[5] Lawrence, J. and Reed, C. 2015. Combining argument mining         [20] Wachsmuth, H., Khalid A. and Stein, B. 2016. Using
    techniques. Proceedings of the 2nd Workshop on                         Argument Mining to Assess the Argumentation Quality of
    Argumentation Mining, Denver, Colorado, pp. 127-136, 2015.             Essays. Germany: Bauhaus-Universität Weimar.
[6] Lippi, M. and Torroni, P. 2015. Context-independent claim         [21] Walton, D. 2009. Argumentation Theory: A Very Short
    detection for argumentation mining. Proceedings of the                 Introduction. In book: Argumentation in Artificial
    Twenty-Fourth International Joint Conference on Artificial             Intelligence, pp.1-24.
    Intelligence (IJCAI 2015).
[7] Moens, M.F. 2014. Tutorial Argumentation Mining. Belgium
[8] Moens, M.F., Boiy, E., Palau, R.M. and Reed, C. 2007.
    Automatic detection of arguments in legal texts. In


 18th Workshop on Computational Models of Natural Argument                                                                              75
 Floris Bex, Floriana Grasso, Nancy Green (eds)
 16th July 2017, London, UK

</pre>