Argument Mining on Italian News Blogs
            Pierpaolo Basile                                  Valerio Basile
           University of Bari                          Université Côte d’Azur, Inria,
     pierpaolo.basile@uniba.it                             CNRS, I3S, France
                                                     valerio.basile@inria.fr

                            Elena Cabrio, Serena Villata
                            Université Côte d’Azur, CNRS,
                                  Inria, I3S, France
                         firstname.lastname@unice.fr
                 Abstract                         gomenti.

English. The goal of argument mining is
to extract structured information, namely        1    Introduction
the arguments and their relations, from un-
structured text. In this paper, we propose       The argument mining (Peldszus and Stede, 2013;
an approach to argument relation predic-         Lippi and Torroni, 2016) research area has re-
tion based on supervised learning of lin-        cently become very relevant in computational lin-
guistic and semantic features of the text.       guistics. Its main goal is the automated extrac-
We test our method on the CorEA cor-             tion of natural language arguments and their re-
pus of user comments to online newspaper         lations from generic textual corpora, with the
articles, evaluating our system’s perfor-        final goal of providing machine-readable struc-
mances in assigning the correct relation,        tured data for computational models of argument
i.e., support or attack, to pairs of argu-       and reasoning engines. Two main stages have
ments. We obtain results consistently bet-       to be considered in the typical argument mining
ter than a sentiment analysis-based base-        pipeline, from the unstructured natural language
line (over two out three correctly classified    documents towards structured (possibly machine-
pairs), and we observe that sentiment and        readable) data: (i) argument extraction, i.e., to de-
lexical semantics are the most informative       tect arguments within the input natural language
features with respect to the relation predic-    texts, and (ii) relation extraction, i.e., to predict
tion task.                                       what are the relations holding between the argu-
                                                 ments identified in the first stage. The relation pre-
Italiano. L’estrazione automatica di ar-         diction task is extremely complex, as it involves
gomenti ha come scopo recuperare in-             high-level knowledge representation and reason-
formazione strutturata, in particolare gli       ing issues. The relations between the arguments
argomenti e le loro relazioni, a par-            may be of heterogeneous nature, like attack, sup-
tire da testo semplice. In questo con-           port or entailment (Cabrio and Villata, 2013).
tributo proponiamo un metodo di predi-              The increasing amount of data available on the
zione delle relazioni tra argomenti basato       Web from heterogeneous sources, e.g., social net-
sull’apprendimento supervisionato di fea-        work posts, forums, news blogs, and the specific
ture linguistiche e semantiche del testo. Il     form of language adopted there challenge argu-
metodo è testato sul corpus di commenti         ment mining methods, with the aim to support
di news CorEA, ed è valutata la capacità       users to understand and interact with such a huge
del sistema di classificare le relazioni di      amount of information.
supporto ed attacco tra coppie di argo-             In this paper, we address this issue by present-
menti. I risultati ottenuti sono superiori ad    ing an argument relation prediction approach for
una baseline basata sulla sola analisi del       Italian. We test the method on the CorEA cor-
sentimento (oltre due coppie di argomenti        pus (Celli et al., 2014) of user comments to the
su tre è classificata correttamente) ed os-     news articles of an Italian newspaper, annotated
serviamo che il sentimento e la semantica        with agreement (i.e., support) and disagreement
lessicale sono gli indicatori più informa-      (i.e., attack) relations. We extract argument-level
tivi per la predizione delle relazioni tra ar-   features from the CorEA comment (i.e., argument)
                                                         two arguments from a debate, we aim to predict
                                                         whether one argument attacks the other, supports
                                                         it, or there is no relation between the two argu-
                                                         ments. The construction of the graph structure is
                                                         then straightforward, resulting from the combina-
                                                         tion of all the argument pairs we considered.

                                                         2.1    Features
                                                         We extract argument-level features from the
                                                         CorEA comment pairs, that we group into the fol-
                                                         lowing categories:

                                                         Lexical We take into account several lexical fea-
                                                             tures: tokens, bi-grams, and the first bi-gram
                                                             and tri-gram of each argument.

                                                         Syntactic We exploit the output of a dependency
                                                             parser. We consider two kinds of dependency
                                                             features: the former is the original output, the
                                                             latter generalizes a word to its POS tag. For
                                                             instance, “amod(denaro, pubblico)” is gen-
                                                             eralized as the “amod(NN, pubblico)” and
                                                             “amod(denaro, ADJ)”. We adopt the Malt
                                                             parser (Nivre, 2003) trained on the Universal
                                                             Dependency Treebank1 .

      Figure 1: Example of debate structure.             Message info We extract the argument size, the
                                                             number of uppercase words, the number of
                                                             negations2 , the number of sequences of two
pairs, and we train our system to predict the sup-
                                                             or more punctuation characters, the number
port and attack relations.
                                                             of citations. A citation is a quoted sequence
2   Mining Arguments                                         of words in the second argument that occurs
                                                             in the first argument.
A debate, whether it happens online or in person,
can be modeled as a set of arguments proposed by         Message overlap Cosine similarity between two
the participants. Arguments can be independent,              arguments is computed exploiting TF/IDF.
for instance expressing the participant’s stance on
                                                         Word-embedding We build word-embeddings
a particular topic, but often they are replies to pre-
                                                            relying on the Paisà corpus through the
vious arguments put forward in the debate. This
                                                            word2vec (Mikolov et al., 2013) tool. We use
results in a network structure of the debate, that
                                                            a vector dimension equal to 50, and we con-
is, a (possibly disconnected) directed graph where
                                                            sider only words that occur at least 20 times.
nodes are arguments, and the two kinds of edges
                                                            For each argument, we use the vector compo-
are the support and attack relations between them.
                                                            nents as features directly.
In Figure 1, each node represents an argument
with a numeric identifier, filled and dashed edges       Sentiment We extract the sentiment from the ar-
represent respectively support and attack relations,         guments with two separate tools. Alchemy
and dotted edges are neutral relations. The hub-             API3 , the sentiment analysis feature of IBM’s
like node labeled 11 is a news article, thus attract-        Semantic Text Analysis API, returns a polar-
ing many first-level comments.                               ity label (positive, negative or neutral) and a
   The goal of our work is to be able to predict the        1
                                                             http://universaldependencies.org/it/
relations between the arguments in a given debate,       overview/introduction.html
thus reconstructing the relation graph. We there-          2
                                                             The occurrences of the word “non”
                                                           3
fore cast the problem as a classification task: given        http://www.alchemyapi.com/
         polarity score between -1 (totally negative)       3.2   System setup
         and 1 (totally positive). The UNIBA sys-           We exploit two kinds of learning algorithms: 1)
         tem (Basile and Novielli, 2014), one of the        different configurations of SVM based on lin-
         most successful participants in the Sentipolc      ear kernel (SV Mlin ), degree-2 polynomial kernel
         task at Evalita 2014 (Basile et al., 2014), re-    (SV Mpoly ), and RBF kernel (SV Mrbf ); 2) Ran-
         turns a subjectivity label (subjective or objec-   dom Forest (RF ).
         tive) and a polarity label (positive, negative,       The baseline method always predicts the most
         neutral or mixed).                                 frequent class, in this case “attack”. Moreover, we
                                                            test the two simple sentiment analysis systems al-
Topic model We train a domain-independent
                                                            ready described in 2.1, SAalchemy and SAuniba .
    topic model for Italian and compute, for each
                                                            In particular, these systems exploit the result of the
    argument, its representing vector in the topic
                                                            sentiment analysis in terms of polarity (positive,
    space. The 300-dimensional topic model is
                                                            negative, or neutral) for predicting the relation be-
    created with Gensim4 using the ItWaC cor-
                                                            tween two arguments: if two arguments have the
    pus (Baroni et al., 2009). We use the vec-
                                                            neutral polarity, they are tagged as neutral, while
    tor components as features directly, i.e., each
                                                            they are tagged as “support” in case they have the
    comment has 300 topic-based features.
                                                            same polarity, otherwise the “attack” class is pre-
                                                            dicted. The system is implemented in JAVA rely-
3       Evaluation                                          ing on the Weka tool (Hall et al., 2009). All the ex-
The goal of the evaluation is twofold: i) to com-           periments are performed by adopting the 10-folds
pute the performance of several machine learning            cross-validation. For all the learning methods, we
methods and compare them with respect to some               adopt the default Weka parameters since the goal
baselines, and ii) to investigate the importance of         of our work is not to optimize the classification
each group of features through an ablation test.            performance but to provide a features study.

                                                            3.3   Results
3.1      Data
                                                            Table 1 reports on the best results obtained by each
We test our approach on the CorEA corpus (Celli
                                                            method. Regarding RF the best result is obtained
et al., 2014), a collection of text from Italian news
                                                            using 10 trees, while for SV M we optimize only
blogs. It contains 27 news articles, about 1,660
                                                            the C parameter using default values for the other
unique authors and more than 2,900 comments.
                                                            ones. The best C value for SV Mlin is 1, 2 in all
The corpus is annotated with emotions and, most
                                                            the other settings.
interestingly for our work, the comments are anno-
                                                               Each one of the supervised systems performs
tated pair-wise with agreement information (Celli
                                                            better than the baseline. The good performance of
et al., 2016). We extracted such comment pairs for
                                                            the linear kernel classifier is likely to be ascribed
a total of 1,275 pairs: 682 disagreement, 106 neu-
                                                            to the high number of features. The performance
tral, 180 agreement (307 pairs are not classified,
                                                            of Random Forest is also quite good, considering
examples in Figure 2).
                                                            that only ten trees are employed.
   The CorEA dataset provides several informa-
tion about each message. Beside the features de-
scribed in Section 2.1, we also extract the follow-                           Table 1: Results
                                                                  System       P         R           F
ing dataset-dependent features: the set of manu-
                                                                  baseline     0.4964 0.7045         0.5824
ally annotated topics, the news category of the ar-
ticle, the count of replies to the message, the count             SAalchemy    0.3553 0.3616         0.3584
of message likes, the participant’s activity score,               SAuniba      0.2942 0.3286         0.3105
the participant’s interests, the participant’s page               SV Mlin      0.6789 0.7169         0.6719
views, the participant’s total comments, the partic-              RF           0.6607 0.7180         0.6491
ipant’s total shares, the participant’s likes received,           SV Mpoly     0.6609 0.7097         0.6486
and the overall emotion declared by the participant               SV Mrbf      0.6414 0.7076         0.6120
after reading the articles.
                                                               As can be seen from the results of ablation tests
    4
        https://radimrehurek.com/gensim/                    (see Table 2), the features that contribute the most
     Relation     Example
     Attack       “in certi paesi 100 sterline a settimana permettono di vivere come un pascià”
                  “si ma in certi altri no..;-) la cifra mi sembra davvero esigua..”
     Support      “Caro Renzi , hai visto com’è semplice restituire i soldi? Basta una firmetta... perchè
                  non lo fai anche tu invece di promettere e promettere e promettere?”
                  “Bisogna prendere atto che il movimento 5 stelle sta davvero restituendo i
                  soldi agli Italiani. Questo è un fatto, tutto il resto sono chiacchere.”
     Neutral      “E le riforme?”
                  “le riforme cominciano dl’atteggiamento dei parlamentari. con il
                  cambiamento del mind-set . il punto di partenza.”

                   Figure 2: Examples of relations between pairs of comments in CorEA.


to the argument classification task are the seman-        formation about the claim to predict the evidence.
tic features (i.e., embeddings) and the sentiment         The support relations are thus obtained by defini-
features. This confirms our hypothesis that senti-        tion when predicting the evidence. (Mochales and
ment is a key information for argument mining,            Moens, 2011) have addressed the problem by pars-
and more specifically for the relation prediction         ing with a manually-built context-free grammar to
task. The results also confirm that lexical and           predict relations between argument components.
semantic features are useful for the task, as ex-         The grammar rules follow the typical rhetorical
pected. Table 2 reports also the number of features       and structural patterns of sentences in juridical
(Feat.Size) and the F1 (F1-f) achieved by exploit-        texts. This is a highly genre-specific approach, and
ing the respective feature in isolation. It is impor-     its direct use in other genres would be unlikely to
tant to note that, despite the bad performance ob-        yield accurate results. (Stab and Gurevych, 2014)
tained by both embedding and sentiment features,          instead employ a binary SVM classifier to predict
their contribution in the overall performance is rel-     relations in a claim/premise model. (Biran and
evant.                                                    Rambow, 2011) apply the same method adopted
                                                          for the detection of premises also for the pre-
                 Table 2: Ablation test                   diction of relations between premises and claims.
    Features       F1        ∆% Feat.Size         F1-f    (Wang and Cardie, 2014) apply an isotonic Condi-
    all            0.6719        -    220,499        -    tional Random Fields based sequential model to
    -lexical       0.6624 -1.42       140,443     0.66    make predictions on sentence- or segment-level
    -syntactic     0.6702 -0.26         80,909    0.65    on discussions on Wikipedia Talk pages. Finally,
    -info          0.6691 -0.42       220,490     0.58    (Cabrio and Villata, 2013) adopt Textual Entail-
    -CorEA         0.6674 -0.68       220,218     0.64    ment to infer whether a support or attack relation
    -embedding     0.6525 -2.89       220,399     0.59    between two given arguments holds.
    -overlap       0.6724 0.07        220,498     0.58
    -sentiment     0.6622 -1.45       220,491     0.58    5   Conclusions
    -topic         0.6673 -0.69       220,045     0.59
                                                          In this paper, we have presented a supervised ap-
                                                          proach for argument relation prediction for Ital-
                                                          ian, mainly relying on features including seman-
4     Related Work
                                                          tics and sentiment. We tested such approach on
(Lippi and Torroni, 2016) and (Peldszus and               the CorEA corpus, extracted from user comments
Stede, 2013) provide an overview about the argu-          to online news. Our experimental results are good,
ment mining research area. In particular, some ap-        and foster future research in the direction of in-
proaches have been recently proposed to address           cluding semantics as well as sentiment analysis in
the same task addressed in this paper, i.e. pre-          the argument mining pipeline. It will be also in-
dicting relations between arguments, even if ours         teresting, as future work, to refine the model in
is the first effort for the Italian language. (Aha-       order to consider the full sequence of interactions
roni et al., 2014) assume that evidence is always         between arguments.
associated with a claim, enabling the use of in-
References                                                  Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey
                                                              Dean. 2013. Efficient estimation of word represen-
Ehud Aharoni, Anatoly Polnarov, Tamar Lavee, Daniel           tations in vector space. In Workshop at ICLR, 2013.
  Hershcovich, Ran Levy, Ruty Rinott, Dan Gutfre-
  und, and Noam Slonim. 2014. A benchmark dataset           Raquel Mochales and Marie-Francine Moens. 2011.
  for automatic detection of claims and evidence in the       Argumentation mining. Artificial Intelligence and
  context of controversial topics. In Proceedings of          Law, 19(1):1–22.
  the First Workshop on Argumentation Mining, pages
  29–38, Baltimore, Maryland, June. Association for         Joakim Nivre. 2003. An efficient algorithm for pro-
  Computational Linguistics.                                  jective dependency parsing. In Proceedings of the
                                                              8th International Workshop on Parsing Technologies
Marco Baroni, Silvia Bernardini, Adriano Ferraresi,           (IWPT).
 Eros Zanchetta, Springer, and Science+business
 Media B. V. 2009. The wacky wide web: A col-               Andreas Peldszus and Manfred Stede. 2013. From ar-
 lection of very large linguistically processed we-           gument diagrams to argumentation mining in texts:
 bcrawled corpora. language resources and evalua-             A survey. IJCINI, 7(1):1–31.
 tion.
                                                            Christian Stab and Iryna Gurevych. 2014. Identifying
Pierpaolo Basile and Nicole Novielli. 2014. Uniba             argumentative discourse structures in persuasive es-
   at evalita 2014-sentipolc task: Predicting tweet sen-      says. In Proceedings of the 2014 Conference on Em-
   timent polarity combining micro-blogging, lexicon          pirical Methods in Natural Language Processing,
   and semantic features. Proceedings of EVALITA,             EMNLP 2014, October 25-29, 2014, Doha, Qatar,
   pages 58–63.                                               A meeting of SIGDAT, a Special Interest Group of
                                                              the ACL, pages 46–56.
Valerio Basile, Andrea Bolioli, Malvina Nissim, Vi-
  viana Patti, and Paolo Rosso. 2014. Overview of           Lu Wang and Claire Cardie. 2014. Improving agree-
  the Evalita 2014 SENTIment POLarity Classifica-             ment and disagreement identification in online dis-
  tion Task. In Proceedings of the 4th evaluation cam-        cussions with a socially-tuned sentiment lexicon. In
  paign of Natural Language Processing and Speech             Proceedings of the 5th Workshop on Computational
  tools for Italian (EVALITA’14), Pisa, Italy.                Approaches to Subjectivity, Sentiment and Social
                                                              Media Analysis, pages 97–106, Baltimore, Mary-
Or Biran and Owen Rambow. 2011. Identifying justi-            land, June. Association for Computational Linguis-
  fications in written dialogs by classifying text as ar-     tics.
  gumentative. Int. J. Semantic Computing, 5(4):363–
  381.

Elena Cabrio and Serena Villata. 2013. A natural
  language bipolar argumentation approach to support
  users in online debate interactions†. Argument &
  Computation, 4(3):209–230.

Fabio Celli, Giuseppe Riccardi, and Arindam Ghosh.
  2014. Corea: Italian news corpus with emotions and
  agreement. In CLIC-it 2014, pages 98–102.

Fabio Celli, Giuseppe Riccardi, and Firoj Alam. 2016.
  Multilevel annotation of agreement and disagree-
  ment in italian news blogs. In Nicoletta Calzo-
  lari (Conference Chair), Khalid Choukri, Thierry
  Declerck, Sara Goggi, Marko Grobelnik, Bente
  Maegaard, Joseph Mariani, Helene Mazo, Asun-
  cion Moreno, Jan Odijk, and Stelios Piperidis, edi-
  tors, Proceedings of the Tenth International Confer-
  ence on Language Resources and Evaluation (LREC
  2016), Paris, France, may. European Language Re-
  sources Association (ELRA).

Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard
 Pfahringer, Peter Reutemann, and Ian H. Witten.
 2009. The weka data mining software: An update.
 SIGKDD Explor. Newsl., 11(1):10–18, November.

Marco Lippi and Paolo Torroni. 2016. Argumentation
 mining: State of the art and emerging trends. ACM
 Trans. Internet Techn., 16(2):10.