=Paper=
{{Paper
|id=Vol-2048/paper12
|storemode=property
|title=An Attempt to Combine Features in Classifying Argument Components in Persuasive Essays
|pdfUrl=https://ceur-ws.org/Vol-2048/paper12.pdf
|volume=Vol-2048
|authors=Yunda Desilia,Velizya Thasya Utami,Cecilia Arta,Derwin Suhartono
|dblpUrl=https://dblp.org/rec/conf/icail/DesiliaUAS17
}}
==An Attempt to Combine Features in Classifying Argument Components in Persuasive Essays==
An attempt to combine features in classifying argument
components in persuasive essays
Yunda Desilia, Velizya Thasya Utami, Cecilia Arta, Derwin Suhartono
School of Computer Science
Bina Nusantara University
Jakarta, Indonesia
{yunda.desilia, velizya.utami, cecilia.arta}@binus.ac.id, dsuhartono@binus.edu
ABSTRACT and helps in teaching critical thinking. Thus, having a better
So far, several approaches have been done in detecting and accuracy in classifying argument components becomes a
classifying argumentation in persuasive essays. In this paper, we compulsory problem.
proposed some new features on top of the state-of-the-art In this work, we proposed some new features on top of the state-of-
researches in argumentation mining. We grouped 68 features into 8 the-art research in argumentation mining. We implemented 68 sub-
categories; they are structural, lexical, indicators, contextual, features that are grouped into 8 main categories of features. They
syntactic, prompt similarity, word embedding, and discourse are structural, lexical, contextual, indicators, prompt similarity,
features. Instead of handcrafted features, we utilized word syntactic, word embedding, and discourse. We also provided
embedding as the feature. At the end of this paper, we presented the accuracy comparison to previous systems that were related to our
comparison between each group of features to classify the argument work. We propose approach that consists of two main steps in our
components. 402 persuasive essays were utilized. We found that research. First, we did component identification, which include a
structural features were the most significant feature while discourse process of identification and detection of argument component. We
features were not. After combining all features, we obtained separated argumentative text units from non-argumentative text
79.96% as the accuracy; it was slightly outperforming the state-of- units and also identified the presence of argument component.
the-art accuracy which was 77.3%. Secondly, we did component classification, which include
classification process of argument component type into major
Keywords claim, claim, premise, or non-argumentative.
argument component, feature, word embedding, argumentation
mining, persuasive essay 2. RELATED WORKS
There are several works that are related with this research,
1. INTRODUCTION specifically in the field of argument detection and classification.
Argumentation is a process of building arguments, exchanging Moens, Boiy, Palau, and Reed (2007) did a research of automatic
arguments, and evaluating arguments in terms of interaction with detection of arguments in legal texts. They used lexical, syntactic,
the other arguments. An argument is a set of premises or semantic, and discourse features. In this research, they used
evidence/fact, which are given to support the claim (Palau and Araucaria corpus as the dataset and Multinomial naïve Bayes and
Moens, 2009). The objective of argumentation is to make the maximum entropy model as the classifiers. As the result, they
audiences believe the idea, thought, or opinion stated are true and obtained 74% accuracy of all features extraction in the variant of
proved. Argumentation mining aims to detect the arguments in a texts and 68% in legal texts. The detection and classification of
text document, relation between them, and internal structure of each argument component and the identification of argument structure
argument. By integrating argumentation mining in writing was proposed by Palau and Moens (2009). They used Araucaria
environments, human will be able to inspect their text for corpus and European Court of Human Rights (ECHR) as the data
plausibility and to improve the quality of their argumentation. and feature extraction as the method. This research obtained 73%
A minimum definition of an argument is a set of statement that of accuracy in Araucaria and 80% of accuracy in ECHR. On the
consists of 3 parts: conclusion, premises, and inference (Walton, other hand, the accuracy was 74.07% for premise and conclusion
2009). On the other hand, it is stated that argument is a statement classification and it yielded 60% for detecting the argument
with 3 components: claim/point of view that is argued, actual structure. Lippi and Torroni (2015) proposed several methods to
argument/evidence, a statement that links first claim to the detect claims. They used IBM corpus dataset and 90 persuasive
argument and makes sure the function of argument can be essays. As the result, they achieved 71.4% of accuracy in the 90
understood. (Moens, 2014) persuasive essays and 20.6% in IBM corpus. Al-Khatib et al. (2016)
proposed a distant supervision approach in classifying
Palau (2008) stated that argumentation detection can help to argumentative parts in text automatically from online debate portal.
facilitate understanding of argumentation paragraph, demonstrate a They used corpus of Webis-debate-16 and did a cross-domain
good identification for important information, increase the comparison with 90 persuasive essays and web discourse corpus.
possibility of indexing implementation or document searching, This research achieved 66.8% of accuracy in 90 persuasive essays
represent reasoning system. The classification of argument corpus, 87.7% of accuracy in web discourse corpus, and 91.8% of
component and visualization has several advantages, such as to accuracy in Webis-debate-16. For the experiment of cross-domain
show clear, strong, and structured/organized arguments. Besides, it comparison, the highest accuracy was obtained by web discourse
also facilitates evaluation of opinion, facilitates understanding of corpus tested in Webis-debate-16, which reached 84.4% of
other’s opinions, helps in giving the teaching of general thoughts, accuracy.
18th Workshop on Computational Models of Natural Argument 71
Floris Bex, Floriana Grasso, Nancy Green (eds)
16th July 2017, London, UK
The other focus to classify the arguments by identifying 3. METHODS
argumentation schemes was done by Feng and Hirst (2011). They
used Araucaria database, features extraction, and two methods of 3.1 Data
classification. The features used in this research were general and We utilized a corpus of persuasive essays compiled by Stab and
scheme-specific features. The highest accuracy was 90.8% in Gurevych (2016). It consists of 402 annotated persuasive essays
scheme target of reasoning while the lowest accuracy was 63.2% in with different kind of topics. This corpus contains argument
scheme target of classification for one-against-others- component annotation in the clause-level as well as argumentative
classification. For pairwise classification, the highest accuracy was relations and argument structure in a different level of discourse. It
98.3% in scheme target of classification-reasoning and the lowest also contains annotation about major claim, claim, premise in each
accuracy was 64.2% in scheme target of classification- of essay and consists of 7.116 sentences with 147.271 tokens.
consequences.
3.2 Current Features
To identify the argumentative discourse, some researchers did We implemented 68 sub-features that were categorized into 8
annotation study to create the corpus. Stab and Gurevych (2014a) groups: structural, lexical, indicators, contextual, syntactic, prompt
did the annotation study and created corpus of 90 persuasive essays. similarity, word embedding, and discourse features. The features
They continued the research by identifying the argument described in this section were combined from some researches in
component and the argumentative relations in persuasive essays. argument components classification.
Support Vector Machine (SVM) was used and it obtained 77.3% of
accuracy with structural feature as the best performing feature. On 3.2.1 Structural Features
further research, they created an approach to parse the Structural features are features that identified argument component
argumentation structures in persuasive essays (Stab and Gurevych, based on structure of the text. Covering sentence is a sentence that
2016). They created corpus of 402 persuasive essays and extracted contains the argument component in it. Structural includes 3 sub-
the features to identify the argument component, classified the features, which are token statistics, location, and punctuation. For
argument component, identified the argumentative relation, tree token statistics, we defined the number of tokens from argument
generation, and stance recognition. They obtained 77.3% of component, the number of tokens from covering sentence, the
accuracy and structural was the best performing features. They also number of tokens preceding and following an argument component
proposed approach to recognize the absence of opposing arguments in the covering sentence, the token ratio between covering sentence
in persuasive essays. They used both corpus of 90 persuasive essays and argument component, the number of tokens from covering
and 402 persuasive essays. As the result, they got 75.6% of paragraph, the number of covering sentences preceding and
accuracy. The combination of unigrams, production rules, and following paragraph, the token ratio between covering sentence and
adversative transitions obtained the highest accuracy among all of covering paragraph, the token ratio between covering sentence and
combinations. Habernal and Gurevych (2016) annotated and essay, the average number of token at sentence, the ratio and a
analyzed the arguments automatically in user-generated web Boolean feature that indicates if the argument component covers all
discourse by extracting 5 (five) feature sets to detect the argument tokens of its covering sentence as token statistics features. For
component. As the result, they obtained 75.4% of accuracy. location, we defined a set of location-based features for exploiting
the structural properties of essay. 4 Boolean features that indicate
Some researchers focused on the approach to identify the if the argument component is present in the introduction or
argumentation structures. Peldszus (2014) proposed an approach to conclusion of an essay and if it is present in the first or the last
identify argumentation structures in micro text automatically with sentence of a paragraph. Secondly, we add the position of the
the various level of granularity. They used 115 micro text as the covering sentence in the essay and the position of the covering
dataset and extracted the features and did a comparison with some sentence in the paragraph as a numeric feature. We also count the
types of classifiers. The most outperformed classifiers were ratio of covering sentence and paragraph, the ratio of covering
Support Vector Machine (SVM) and Maximum Entropy Classifiers sentence and essay, and the ratio of paragraph and essay. For
(MaxEnt). SVM obtained 64% of accuracy and MaxEnt obtained punctuation, we define a set of punctuation-based feature to
63% of accuracy. The best features to obtain the high accuracy were identify characteristics of argument component. This features will
lemma unigrams and lemma bigrams. Lawrence and Reed (2015) return the number of punctuation marks of the covering sentence
proposed 3 (three) methods to extract argumentation structures. and the number of punctuation marks of the argument component,
They used AIFdb corpus and implemented discourse indicators, the number of punctuation marks preceding and following an
topic similarity, and schematic structure as the methods. The argument component in its covering sentence and a Boolean feature
combination of those methods reached 83% of accuracy with the that indicates whether the sentence is closed with a question mark
best performing feature was schematic feature. or not.
Further implementation of argumentation detection and 3.2.2 Lexical Features
classification, such as accessing the quality of arguments have been
These features are defined by N-grams, POS N-grams, verbs,
done by some researchers. Wachsmuth, Al-Khatib, and Stein
adverbs, modals auxiliary, comparative and superlative adjective,
(2016) investigated mining structure to access the argumentation
the ratio of pronouns, and word couples.
quality of persuasive essays. They used corpus that contains essays
from International Corpus of Learner English, extracted the 3.2.3 Indicator Features
features, and classified the argument component into ADU types: Boolean features indicating the presence of question indicators,
thesis, conclusion, premise, and none. They obtained 74.5% of time indicators, evidence indicator, conclusion indicator, compare-
accuracy with the sentence position as the best performing feature. and-contrast, and cue phrases. We used 55 discourse markers as
well and modelled each as a Boolean feature set to true if one of
them is present in the covering sentence. The discourse markers
were taken from the Penn Discourse Treebank 2.0 Annotation
Manual (Prasad et. al., 2007). Furthermore, we also define 4 (four)
72 18th Workshop on Computational Models of Natural Argument
Floris Bex, Floriana Grasso, Nancy Green (eds)
16th July 2017, London, UK
Boolean features that indicate the presence of type indicators x Prompt similarity feature was the similarity of cosine value
including forward indicators, backward indicators, thesis indicators between current sentence with the prompt.
and rebuttal indicators. In addition, we defined 5 (five) Boolean x Word embedding feature was defined to extract the vector
features to identify possessive pronoun (I, me, mine, myself, my) representation of each word.
in covering sentence.
4. RESULTS AND DISCUSSION
3.2.4 Contextual Features
These features return the number of punctuations, number of tokens
4.1 Performance
and sub-clauses from the sentence preceding and following the There are 8 categories of features that were implemented for the
covering sentence, the number of covering sentence preceding and features extraction: structural, indicator, contextual, lexical,
following the covering sentence. We also defined Boolean features syntactic, prompt similarity, word embedding, and discourse with
total of 68 sub-features. We used Support Vector Machine (SVM)
indicating the presence of modal verbs, question indicator,
comparative and superlative adjective, and type of indicators. In as classifier by using 10-folds cross validation and utilized a corpus
addition, we defined 4 (four) Boolean features and numeric that of 402 annotated persuasive essays by Stab and Gurevych (2016).
indicate if the shared noun and shared verb is present in the The accuracy result of this system was 79.96%. It indicated that a
introduction or conclusion of an essay. higher accuracy was achieved in comparison to the argument
component detection and classification systems conducted in the
3.2.5 Syntactic Features previous works as shown in Table 1. Even though this comparison
We count the number of sub-clauses in each sentence and return did not show a proper objective evaluation due to task differences
numeric value. We also count the depth of parse tree, extract the among them, our accuracy was quite promising to surpass previous
production rules, and identify whether the sentence is in past tense, works, especially to Stab and Gurevych (2014b).
present tense, or not in both.
Table 1. Previous works performance
3.2.6 Prompt Similarity Features
These features were created to count the similarity of cosine value Related Work Accuracy
between current sentence and the prompt, with the first sentence in Palau and Moens (2007) 74%
each paragraph, with the last sentence in each paragraph, with its Palau and Moens (2009) 74.04%
preceding sentence, and with its following sentence. Stab and Gurevych (2014b) 77.3%
Lippi and Torroni (2015) 71.4%
3.2.7 Word Embedding Features Stab and Gurevych (2016) 77.3%
They were created to count the vector representation of each word. Wachsmuth, Al-Khatib, and Stein (2016) 74.5%
Glove was used to obtain the vector representation for each word. Habernal and Gurevych (2016) 75.4%
We count the average of vector values per argument component. Al-Khatib et al. (2016) 66.8%
3.2.8 Discourse Features
We implemented discourse doubles, which return: (1) count of Table 2. Confusion matrix of the system accuracy results
explicit and implicit relation in a sentence and then return the count (SVM) for argument component classification
of which type present the most, (2) the ratio of explicit and implicit MC Cl Pr No
relation. Explicit discourse connectives are drawn primarily from
MC 578 130 43 0
well-defined syntactic classes, while implicit discourse connectives
Cl 226 309 970 1
are inserted between paragraph-internal adjacent sentence pairs not
Pr 28 147 3656 1
related explicitly by any of the syntactically defined set of explicit
connectives. No 0 0 0 1638
3.3 Additional Features Table 2 explains that the system correctly identifies 578 major
To explore further in classifying argument components, we defined claims (MC), 309 claims (Cl), 3656 premises (Pr), and 1638 non-
some features which are quite promising to boost the accuracy of argumentative (No). The errors occurred in identifying claims.
classification. Our additional features included 7 main features, Most of them were identified as premise. The accuracy in
which were structural, lexical, indicators, contextual, syntactic, identifying each component was 76.96% for major claim, 20.52%
prompt similarity, and discourse features. for claim, 95.41% for premise, and 100% for non-argumentative.
x Structural features were number of token in covering We guessed the accuracy to identify claims was very low due to
paragraph, number of preceding and following covering class imbalance where claim had the lowest amount of data. Beside
sentence in covering paragraph, and position of covering using 10-folds cross validation for training, we also conducted
sentence in paragraph. experiments using 5-folds cross validation with 79.74% accuracy.
x Lexical features were POS N-grams and word couples. Table 3. Previous works performance
x Indicator features were forward, backward, rebuttal, thesis
indicators, and cue phrases. Feature Accuracy Feature Name Accuracy
Name
x Contextual features were type of indicators in context, number
Structural 77.83% Syntactic 51.35%
of shared noun and shared verb that are present in introduction
and conclusion in essay, and 4 binary features that indicates Indicator 54.73% Prompt Similarity 54.79%
shared noun and verbs that are present in introduction or Contextual 63.10% Word Embedding 49.46%
conclusion in essay. Lexical 61.06% Discourse 49.41%
x Syntactic feature was POS distribution. All Features 79.96%
18th Workshop on Computational Models of Natural Argument 73
Floris Bex, Floriana Grasso, Nancy Green (eds)
16th July 2017, London, UK
We conducted experiments by using each feature group to capture Table 5. Accuracy result of implementation features by Stab
which feature sets were significant in classifying the argument and Gurevych (2014b)
components. Based on Table 3, the best feature set to classify
argument components is structural feature with 77.83% accuracy Feature Name Accuracy
result. Contextual and lexical features consecutively were the next Structural 74.33%
significant features among all. Indicator 61.11%
Contextual 52.38%
4.2 Combining the Features Lexical 58.69%
We attempted to combine all features as the next experiment. It was Syntactic 50.94%
to identify which features combination has the best and the least All Features 76.32%
impact in improving the system’s accuracy.
Table 4. Accuracy result of combination of feature without We proposed some handcrafted features to develop algorithm to
one feature category in system identify and classify argument components and to increase the
Feature Name Accura Feature Name Accuracy accuracy of system. This experiment used 24 sub-features which
cy produced 68.46% as the accuracy result. In addition, we ran the
Without Structural 69.74% Without Syntactic 78.21% system using each feature’s category to identify each feature’s
performance (Table 6).
Without Lexical 77.72% Without Prompt 79.93%
Similarity Table 6. Accuracy result of proposed handcrafted features
Without 77.98% Without Word 78.48%
Indicators Embedding Feature Name Accuracy
Without 78.05% Without 79.98% Structural 63.81%
Contextual Discourse Indicator 49.45%
Contextual 59.94%
Lexical 49.58%
Based on Table 4, we can conclude that the most influential feature Prompt Similarity 54.70%
is structural, because all combination of features without structural Discourse 49.43%
has the lowest accuracy result with 69.74%, while the least All Features 68.46%
influential feature is discourse as without discourse feature, the
accuracy result is 79.98%.
From the result presented in Table 6, the system achieved 68.46%
From 8 trials of features combination, 7 of them showed significant accuracy with the highest accuracy achieved by structural features
accuracy, where 7 of them achieved an accuracy of 77.7% to which followed by contextual and prompt similarity features as the
79.9%. This result indicates that the accuracy achieved by the second and the third most performing features.
combination of features produces higher accuracy compared to the
accuracy of previous works (Table 1). In addition, we can see from The experiments also implemented additional features which were
the experiments that the accuracy of the system significantly obtained from previous works conducted from state-of-the-art
decreased as a result of the feature extraction without structural researches. There are 16 additional sub-features implemented in
features. Therefore, we also did an experiment with combination of this scenario. Based on Table 7, the system achieved 71.08% of
3 (three) features that achieved the highest accuracy, i.e. structural, accuracy with the most significant accuracy was achieved by
lexical, and contextual features which produced 77.87% as the structural and contextual features. Word embedding feature was
accuracy result. less performing feature in this experiment.
4.3 Comparing Each Group of Features Table 7. Accuracy result of additional features from state-of-
We conducted other experiments by comparing system’s accuracy the-art researches
among implementation by using the features presented by Stab and Feature Name Accuracy
Gurevych (2014b), handcrafted features proposed by authors, and Structural 61.15%
additional features from previous works. The system was trained Indicator 53.95%
using the same corpus consisting 402 annotated persuasive essays Contextual 50.69%
compiled by Stab and Gurevych (2016).
Lexical 59.27%
Stab and Gurevych (2014b) implemented structural, indicator, Syntactic 50.72%
contextual, lexical, and syntactic features with total of 28 sub- Prompt Similarity 54.79%
features. Our system’s accuracy result using features extraction Word Embedding 49.46%
based on Stab and Gurevych (2014b) is 76.32% (Table 5), while All Features 71.08%
the original accuracy result of their research was 77.3% by using
90 persuasive essays where the highest accuracy is achieved by
structural features. The result’s difference can be caused by the 5. CONCLUSIONS
different number of the training data. After all the experiments, we have done to detect and classify the
argument component, we found that 79.96% of accuracy was
achieved by implementing all features set. We defined 68 sub-
features which were summarized into 8 categories of features: they
were structural, lexical, indicator, contextual, syntactic, word
embed-ding, prompt similarity, and discourse features. We found
that structural features were the best feature that had the most
74 18th Workshop on Computational Models of Natural Argument
Floris Bex, Floriana Grasso, Nancy Green (eds)
16th July 2017, London, UK
significant impact to the system’s accuracy, which obtained Proceedings of the 11th International Conference on Artificial
77.83% accuracy. The other significant features are contextual and Intelligence and Law, ICAIL '07, pages 225-230, Stanford,
lexical, with the accuracy of 63.10% and 61.06%. CA, USA.
The most significant features combination was the combination of [9] Palau. 2008. Automatic argumentation detection. Project
all features without discourse features. This combination obtained ACILA - Automatic Detection and Classification of
79.98% accuracy, which was higher than the total accuracy of all Arguments in a Legal Case, Leuven, Belgium.
features. The combination of all features without structural [10] Palau, R.M. and Moens, M.F. 2009. Argumentation mining:
performed the lowest accuracy, so that we conclude that structural the detection, classification and structure of arguments in text.
features was the most significant feature while discourse features In Proceedings of the 12th International Conference on
was not. Besides, the combination of 3 (three) structural, Artificial Intelligence and Law, ICAIL'09, pp. 98-107,
contextual, and lexical features also performed a significant Barcelona, Spain, 2009.
accuracy, which was 77.87%. Features proposed by Stab and
Gurevych (2014b) performed the highest accuracy, which was [11] Peldszus, A. 2014. Towards segment-based recognition of
76.32%. Each of experiment in comparing features classification argumentation structure in short texts. Proceedings of the First
could obtain more than 67% of accuracy. It means that each of the Workshop on Argumentation Mining, pages 88-97,Baltimore,
experiment could identify argument components for more than Maryland USA, June 26, 2014.
67%. [12] Peldszus, A. and Stede, M. 2013. Form argument diagrams to
argumentation mining in texts: a survey. International Journal
Since the experiments showed that the most significant features of Cognitive Informatics and Natural Intelligence Volume 7
were structural, contextual, and lexical, we concern to develop Issue 1, January 2013 Pages 1-31.
these groups for our next experiment. We also find that the data
training in bigger number with various topics and characteristics [13] Prasad, R., Miltsakaki, E., Dinesh, N., Lee, A., Joshi, A.,
will probably increase the accuracy of system. Besides, we also Robaldo, L., and Webber, B.L. 2007. The Penn Discourse
must define the other features or the other method that can help in Treebank 2.0 annotation manual. Technical report, Institute
differentiate the premise and claim further. for Research in Cognitive Science, University of
Pennsylvania.
6. ACKNOWLEDGMENTS [14] Stab, C. and Gurevych, I. 2014a. Annotating argument
This research work was supported by Bina Nusantara University components and relations in persuasive essays. In:
and partly supported by research grant from Directorate General of Proceedings of the 25th International Conference on
Research and Development Reinforcement, Ministry of Research, Computational Linguistics (COLING 2014), pp. 1501-1510,
Technology and Higher Education of the Republic of Indonesia. Dublin, Ireland, 2014.
7. REFERENCES [15] Stab, C. and Gurevych, I. 2014b. Identifying argumentative
[1] Khatib, A., Wachsmuth, H., Matthias, Hagen, M., Kohler, J., discourse structures in persuasive essays. In: Conference on
and Stein, B. 2016. Cross-domain mining of argumentative Empirical Methods in Natural Language Processing (EMNLP
text through distant supervision. In 15th Conf. Of the North 2014), pp. 46-56, Doha, Qatar, 2014.
American Chapter of the Association for Computational [16] Stab, C. and Gurevych, I. 2016. Parsing argumentation
Linguistics (NAACL'16) (to appear). Association for structures in persuasive essays. In: arXiv preprint, under
Computational Linguistics, San Diego, CA, USA, 2016. review, April 2016. Germany: Technische Universität
[2] Feng, V.W. and Graeme H. 2011. Classifying arguments by Darmstadt.
scheme. Proceedings of 49th Annual Meeting of the [17] Stab, C. and Gurevych, I. 2016. Recognizing the absence of
Association for Computational Linguistics, Portland, Oregon, opposing arguments in persuasive essays. In: Proceedings of
pp. 987-996, 2011 the 3rd Workshop on Argument Mining held in conjunction
[3] Habernal, I. and Gurevych, I. 2015. Exploiting debate portals with the 2016 Annual Meeting of the Association for
for semi-supervised argumentation mining in user-generated Computational Linguistics (ACL 2016), p. 113-118, August
web discourse. In: Proceedings of the 2015 Conference on 2016
Empirical Methods in Natural Language Processing (EMNLP [18] Stab, C. and Habernal, I. 2015. Detecting argument
2015), pp. 2127-2137, Lisbon, Portugal, 2015. components and structures. In: Report of Dagstuhl Seminar on
[4] Habernal.I. and Gurevych, I. 2016. Argumentation mining in Debating Technologies (15512), Vol. 5, p. 32-32, 2016.
user-generated web discourse. Computational Linguistics, in [19] Toulmin, S. E. 1958. The Uses of Argument. Cambridge
press. University Press.
[5] Lawrence, J. and Reed, C. 2015. Combining argument mining [20] Wachsmuth, H., Khalid A. and Stein, B. 2016. Using
techniques. Proceedings of the 2nd Workshop on Argument Mining to Assess the Argumentation Quality of
Argumentation Mining, Denver, Colorado, pp. 127-136, 2015. Essays. Germany: Bauhaus-Universität Weimar.
[6] Lippi, M. and Torroni, P. 2015. Context-independent claim [21] Walton, D. 2009. Argumentation Theory: A Very Short
detection for argumentation mining. Proceedings of the Introduction. In book: Argumentation in Artificial
Twenty-Fourth International Joint Conference on Artificial Intelligence, pp.1-24.
Intelligence (IJCAI 2015).
[7] Moens, M.F. 2014. Tutorial Argumentation Mining. Belgium
[8] Moens, M.F., Boiy, E., Palau, R.M. and Reed, C. 2007.
Automatic detection of arguments in legal texts. In
18th Workshop on Computational Models of Natural Argument 75
Floris Bex, Floriana Grasso, Nancy Green (eds)
16th July 2017, London, UK