Argument Mining on Italian News Blogs Pierpaolo Basile Valerio Basile University of Bari Université Côte d’Azur, Inria, pierpaolo.basile@uniba.it CNRS, I3S, France valerio.basile@inria.fr Elena Cabrio, Serena Villata Université Côte d’Azur, CNRS, Inria, I3S, France firstname.lastname@unice.fr Abstract gomenti. English. The goal of argument mining is to extract structured information, namely 1 Introduction the arguments and their relations, from un- structured text. In this paper, we propose The argument mining (Peldszus and Stede, 2013; an approach to argument relation predic- Lippi and Torroni, 2016) research area has re- tion based on supervised learning of lin- cently become very relevant in computational lin- guistic and semantic features of the text. guistics. Its main goal is the automated extrac- We test our method on the CorEA cor- tion of natural language arguments and their re- pus of user comments to online newspaper lations from generic textual corpora, with the articles, evaluating our system’s perfor- final goal of providing machine-readable struc- mances in assigning the correct relation, tured data for computational models of argument i.e., support or attack, to pairs of argu- and reasoning engines. Two main stages have ments. We obtain results consistently bet- to be considered in the typical argument mining ter than a sentiment analysis-based base- pipeline, from the unstructured natural language line (over two out three correctly classified documents towards structured (possibly machine- pairs), and we observe that sentiment and readable) data: (i) argument extraction, i.e., to de- lexical semantics are the most informative tect arguments within the input natural language features with respect to the relation predic- texts, and (ii) relation extraction, i.e., to predict tion task. what are the relations holding between the argu- ments identified in the first stage. The relation pre- Italiano. L’estrazione automatica di ar- diction task is extremely complex, as it involves gomenti ha come scopo recuperare in- high-level knowledge representation and reason- formazione strutturata, in particolare gli ing issues. The relations between the arguments argomenti e le loro relazioni, a par- may be of heterogeneous nature, like attack, sup- tire da testo semplice. In questo con- port or entailment (Cabrio and Villata, 2013). tributo proponiamo un metodo di predi- The increasing amount of data available on the zione delle relazioni tra argomenti basato Web from heterogeneous sources, e.g., social net- sull’apprendimento supervisionato di fea- work posts, forums, news blogs, and the specific ture linguistiche e semantiche del testo. Il form of language adopted there challenge argu- metodo è testato sul corpus di commenti ment mining methods, with the aim to support di news CorEA, ed è valutata la capacità users to understand and interact with such a huge del sistema di classificare le relazioni di amount of information. supporto ed attacco tra coppie di argo- In this paper, we address this issue by present- menti. I risultati ottenuti sono superiori ad ing an argument relation prediction approach for una baseline basata sulla sola analisi del Italian. We test the method on the CorEA cor- sentimento (oltre due coppie di argomenti pus (Celli et al., 2014) of user comments to the su tre è classificata correttamente) ed os- news articles of an Italian newspaper, annotated serviamo che il sentimento e la semantica with agreement (i.e., support) and disagreement lessicale sono gli indicatori più informa- (i.e., attack) relations. We extract argument-level tivi per la predizione delle relazioni tra ar- features from the CorEA comment (i.e., argument) two arguments from a debate, we aim to predict whether one argument attacks the other, supports it, or there is no relation between the two argu- ments. The construction of the graph structure is then straightforward, resulting from the combina- tion of all the argument pairs we considered. 2.1 Features We extract argument-level features from the CorEA comment pairs, that we group into the fol- lowing categories: Lexical We take into account several lexical fea- tures: tokens, bi-grams, and the first bi-gram and tri-gram of each argument. Syntactic We exploit the output of a dependency parser. We consider two kinds of dependency features: the former is the original output, the latter generalizes a word to its POS tag. For instance, “amod(denaro, pubblico)” is gen- eralized as the “amod(NN, pubblico)” and “amod(denaro, ADJ)”. We adopt the Malt parser (Nivre, 2003) trained on the Universal Dependency Treebank1 . Figure 1: Example of debate structure. Message info We extract the argument size, the number of uppercase words, the number of negations2 , the number of sequences of two pairs, and we train our system to predict the sup- or more punctuation characters, the number port and attack relations. of citations. A citation is a quoted sequence 2 Mining Arguments of words in the second argument that occurs in the first argument. A debate, whether it happens online or in person, can be modeled as a set of arguments proposed by Message overlap Cosine similarity between two the participants. Arguments can be independent, arguments is computed exploiting TF/IDF. for instance expressing the participant’s stance on Word-embedding We build word-embeddings a particular topic, but often they are replies to pre- relying on the Paisà corpus through the vious arguments put forward in the debate. This word2vec (Mikolov et al., 2013) tool. We use results in a network structure of the debate, that a vector dimension equal to 50, and we con- is, a (possibly disconnected) directed graph where sider only words that occur at least 20 times. nodes are arguments, and the two kinds of edges For each argument, we use the vector compo- are the support and attack relations between them. nents as features directly. In Figure 1, each node represents an argument with a numeric identifier, filled and dashed edges Sentiment We extract the sentiment from the ar- represent respectively support and attack relations, guments with two separate tools. Alchemy and dotted edges are neutral relations. The hub- API3 , the sentiment analysis feature of IBM’s like node labeled 11 is a news article, thus attract- Semantic Text Analysis API, returns a polar- ing many first-level comments. ity label (positive, negative or neutral) and a The goal of our work is to be able to predict the 1 http://universaldependencies.org/it/ relations between the arguments in a given debate, overview/introduction.html thus reconstructing the relation graph. We there- 2 The occurrences of the word “non” 3 fore cast the problem as a classification task: given http://www.alchemyapi.com/ polarity score between -1 (totally negative) 3.2 System setup and 1 (totally positive). The UNIBA sys- We exploit two kinds of learning algorithms: 1) tem (Basile and Novielli, 2014), one of the different configurations of SVM based on lin- most successful participants in the Sentipolc ear kernel (SV Mlin ), degree-2 polynomial kernel task at Evalita 2014 (Basile et al., 2014), re- (SV Mpoly ), and RBF kernel (SV Mrbf ); 2) Ran- turns a subjectivity label (subjective or objec- dom Forest (RF ). tive) and a polarity label (positive, negative, The baseline method always predicts the most neutral or mixed). frequent class, in this case “attack”. Moreover, we test the two simple sentiment analysis systems al- Topic model We train a domain-independent ready described in 2.1, SAalchemy and SAuniba . topic model for Italian and compute, for each In particular, these systems exploit the result of the argument, its representing vector in the topic sentiment analysis in terms of polarity (positive, space. The 300-dimensional topic model is negative, or neutral) for predicting the relation be- created with Gensim4 using the ItWaC cor- tween two arguments: if two arguments have the pus (Baroni et al., 2009). We use the vec- neutral polarity, they are tagged as neutral, while tor components as features directly, i.e., each they are tagged as “support” in case they have the comment has 300 topic-based features. same polarity, otherwise the “attack” class is pre- dicted. The system is implemented in JAVA rely- 3 Evaluation ing on the Weka tool (Hall et al., 2009). All the ex- The goal of the evaluation is twofold: i) to com- periments are performed by adopting the 10-folds pute the performance of several machine learning cross-validation. For all the learning methods, we methods and compare them with respect to some adopt the default Weka parameters since the goal baselines, and ii) to investigate the importance of of our work is not to optimize the classification each group of features through an ablation test. performance but to provide a features study. 3.3 Results 3.1 Data Table 1 reports on the best results obtained by each We test our approach on the CorEA corpus (Celli method. Regarding RF the best result is obtained et al., 2014), a collection of text from Italian news using 10 trees, while for SV M we optimize only blogs. It contains 27 news articles, about 1,660 the C parameter using default values for the other unique authors and more than 2,900 comments. ones. The best C value for SV Mlin is 1, 2 in all The corpus is annotated with emotions and, most the other settings. interestingly for our work, the comments are anno- Each one of the supervised systems performs tated pair-wise with agreement information (Celli better than the baseline. The good performance of et al., 2016). We extracted such comment pairs for the linear kernel classifier is likely to be ascribed a total of 1,275 pairs: 682 disagreement, 106 neu- to the high number of features. The performance tral, 180 agreement (307 pairs are not classified, of Random Forest is also quite good, considering examples in Figure 2). that only ten trees are employed. The CorEA dataset provides several informa- tion about each message. Beside the features de- scribed in Section 2.1, we also extract the follow- Table 1: Results System P R F ing dataset-dependent features: the set of manu- baseline 0.4964 0.7045 0.5824 ally annotated topics, the news category of the ar- ticle, the count of replies to the message, the count SAalchemy 0.3553 0.3616 0.3584 of message likes, the participant’s activity score, SAuniba 0.2942 0.3286 0.3105 the participant’s interests, the participant’s page SV Mlin 0.6789 0.7169 0.6719 views, the participant’s total comments, the partic- RF 0.6607 0.7180 0.6491 ipant’s total shares, the participant’s likes received, SV Mpoly 0.6609 0.7097 0.6486 and the overall emotion declared by the participant SV Mrbf 0.6414 0.7076 0.6120 after reading the articles. As can be seen from the results of ablation tests 4 https://radimrehurek.com/gensim/ (see Table 2), the features that contribute the most Relation Example Attack “in certi paesi 100 sterline a settimana permettono di vivere come un pascià” “si ma in certi altri no..;-) la cifra mi sembra davvero esigua..” Support “Caro Renzi , hai visto com’è semplice restituire i soldi? Basta una firmetta... perchè non lo fai anche tu invece di promettere e promettere e promettere?” “Bisogna prendere atto che il movimento 5 stelle sta davvero restituendo i soldi agli Italiani. Questo è un fatto, tutto il resto sono chiacchere.” Neutral “E le riforme?” “le riforme cominciano dl’atteggiamento dei parlamentari. con il cambiamento del mind-set . il punto di partenza.” Figure 2: Examples of relations between pairs of comments in CorEA. to the argument classification task are the seman- formation about the claim to predict the evidence. tic features (i.e., embeddings) and the sentiment The support relations are thus obtained by defini- features. This confirms our hypothesis that senti- tion when predicting the evidence. (Mochales and ment is a key information for argument mining, Moens, 2011) have addressed the problem by pars- and more specifically for the relation prediction ing with a manually-built context-free grammar to task. The results also confirm that lexical and predict relations between argument components. semantic features are useful for the task, as ex- The grammar rules follow the typical rhetorical pected. Table 2 reports also the number of features and structural patterns of sentences in juridical (Feat.Size) and the F1 (F1-f) achieved by exploit- texts. This is a highly genre-specific approach, and ing the respective feature in isolation. It is impor- its direct use in other genres would be unlikely to tant to note that, despite the bad performance ob- yield accurate results. (Stab and Gurevych, 2014) tained by both embedding and sentiment features, instead employ a binary SVM classifier to predict their contribution in the overall performance is rel- relations in a claim/premise model. (Biran and evant. Rambow, 2011) apply the same method adopted for the detection of premises also for the pre- Table 2: Ablation test diction of relations between premises and claims. Features F1 ∆% Feat.Size F1-f (Wang and Cardie, 2014) apply an isotonic Condi- all 0.6719 - 220,499 - tional Random Fields based sequential model to -lexical 0.6624 -1.42 140,443 0.66 make predictions on sentence- or segment-level -syntactic 0.6702 -0.26 80,909 0.65 on discussions on Wikipedia Talk pages. Finally, -info 0.6691 -0.42 220,490 0.58 (Cabrio and Villata, 2013) adopt Textual Entail- -CorEA 0.6674 -0.68 220,218 0.64 ment to infer whether a support or attack relation -embedding 0.6525 -2.89 220,399 0.59 between two given arguments holds. -overlap 0.6724 0.07 220,498 0.58 -sentiment 0.6622 -1.45 220,491 0.58 5 Conclusions -topic 0.6673 -0.69 220,045 0.59 In this paper, we have presented a supervised ap- proach for argument relation prediction for Ital- ian, mainly relying on features including seman- 4 Related Work tics and sentiment. We tested such approach on (Lippi and Torroni, 2016) and (Peldszus and the CorEA corpus, extracted from user comments Stede, 2013) provide an overview about the argu- to online news. Our experimental results are good, ment mining research area. In particular, some ap- and foster future research in the direction of in- proaches have been recently proposed to address cluding semantics as well as sentiment analysis in the same task addressed in this paper, i.e. pre- the argument mining pipeline. It will be also in- dicting relations between arguments, even if ours teresting, as future work, to refine the model in is the first effort for the Italian language. (Aha- order to consider the full sequence of interactions roni et al., 2014) assume that evidence is always between arguments. associated with a claim, enabling the use of in- References Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word represen- Ehud Aharoni, Anatoly Polnarov, Tamar Lavee, Daniel tations in vector space. In Workshop at ICLR, 2013. Hershcovich, Ran Levy, Ruty Rinott, Dan Gutfre- und, and Noam Slonim. 2014. A benchmark dataset Raquel Mochales and Marie-Francine Moens. 2011. for automatic detection of claims and evidence in the Argumentation mining. Artificial Intelligence and context of controversial topics. In Proceedings of Law, 19(1):1–22. the First Workshop on Argumentation Mining, pages 29–38, Baltimore, Maryland, June. Association for Joakim Nivre. 2003. An efficient algorithm for pro- Computational Linguistics. jective dependency parsing. In Proceedings of the 8th International Workshop on Parsing Technologies Marco Baroni, Silvia Bernardini, Adriano Ferraresi, (IWPT). Eros Zanchetta, Springer, and Science+business Media B. V. 2009. The wacky wide web: A col- Andreas Peldszus and Manfred Stede. 2013. From ar- lection of very large linguistically processed we- gument diagrams to argumentation mining in texts: bcrawled corpora. language resources and evalua- A survey. IJCINI, 7(1):1–31. tion. Christian Stab and Iryna Gurevych. 2014. Identifying Pierpaolo Basile and Nicole Novielli. 2014. Uniba argumentative discourse structures in persuasive es- at evalita 2014-sentipolc task: Predicting tweet sen- says. In Proceedings of the 2014 Conference on Em- timent polarity combining micro-blogging, lexicon pirical Methods in Natural Language Processing, and semantic features. Proceedings of EVALITA, EMNLP 2014, October 25-29, 2014, Doha, Qatar, pages 58–63. A meeting of SIGDAT, a Special Interest Group of the ACL, pages 46–56. Valerio Basile, Andrea Bolioli, Malvina Nissim, Vi- viana Patti, and Paolo Rosso. 2014. Overview of Lu Wang and Claire Cardie. 2014. Improving agree- the Evalita 2014 SENTIment POLarity Classifica- ment and disagreement identification in online dis- tion Task. In Proceedings of the 4th evaluation cam- cussions with a socially-tuned sentiment lexicon. In paign of Natural Language Processing and Speech Proceedings of the 5th Workshop on Computational tools for Italian (EVALITA’14), Pisa, Italy. Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 97–106, Baltimore, Mary- Or Biran and Owen Rambow. 2011. Identifying justi- land, June. Association for Computational Linguis- fications in written dialogs by classifying text as ar- tics. gumentative. Int. J. Semantic Computing, 5(4):363– 381. Elena Cabrio and Serena Villata. 2013. A natural language bipolar argumentation approach to support users in online debate interactions†. Argument & Computation, 4(3):209–230. Fabio Celli, Giuseppe Riccardi, and Arindam Ghosh. 2014. Corea: Italian news corpus with emotions and agreement. In CLIC-it 2014, pages 98–102. Fabio Celli, Giuseppe Riccardi, and Firoj Alam. 2016. Multilevel annotation of agreement and disagree- ment in italian news blogs. In Nicoletta Calzo- lari (Conference Chair), Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Helene Mazo, Asun- cion Moreno, Jan Odijk, and Stelios Piperidis, edi- tors, Proceedings of the Tenth International Confer- ence on Language Resources and Evaluation (LREC 2016), Paris, France, may. European Language Re- sources Association (ELRA). Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The weka data mining software: An update. SIGKDD Explor. Newsl., 11(1):10–18, November. Marco Lippi and Paolo Torroni. 2016. Argumentation mining: State of the art and emerging trends. ACM Trans. Internet Techn., 16(2):10.