=Paper=
{{Paper
|id=Vol-2646/18-paper
|storemode=property
|title=Explainability Methods for Natural Language Processing: Applications to Sentiment Analysis
|pdfUrl=https://ceur-ws.org/Vol-2646/18-paper.pdf
|volume=Vol-2646
|authors=Francesco Bodria,André Panisson,Alan Perotti,Simone Piaggesi
|dblpUrl=https://dblp.org/rec/conf/sebd/BodriaPPP20
}}
==Explainability Methods for Natural Language Processing: Applications to Sentiment Analysis==
<pdf width="1500px">https://ceur-ws.org/Vol-2646/18-paper.pdf</pdf>
<pre>
                      Explainability Methods
                 for Natural Language Processing:
                Applications to Sentiment Analysis
                        (Discussion Paper)

    Francesco Bodria1 , André Panisson2 , Alan Perotti2 , and Simone Piaggesi2,3
            1
              Scuola Normale Superiore, Pisa, Italy, francesco.bodria@sns.it,
        2
            ISI Foundation, Turin, Italy, {andre.panisson,alan.perotti}@isi.it
            3
              Università di Bologna, Bologna, Italy, simone.piaggesi2@unibo.it


        Abstract. Sentiment analysis is the process of classifying natural lan-
        guage sentences as expressing positive or negative sentiments, and it is
        a crucial task where the explanation of a prediction might arguably be
        as necessary as the prediction itself. We analysed different explanation
        techniques, and we applied them to the classification task of Sentiment
        Analysis. We explored how attention-based techniques can be exploited
        to extract meaningful sentiment scores with a lower computational cost
        than existing XAI methods.

        Keywords: eXplainable Artificial Intelligence, Natural Language Pro-
        cessing, Sentiment Analysis


1     Introduction
The World Wide Web is a great invention, as it connects everyone in the world,
but simultaneously, it is a double-edged sword. In such an open-by-design ar-
chitecture, everyone can express their feelings from joy to anger. In Social Net-
works, negative sentiments can derail into hate speech; that can be shared easily,
quickly and anonymously, thus becoming a problem. To contain the generation
and diffusion of such undesired content, social network companies had to deploy
special teams that regularly watch the net and block this phenomenon. These
monitoring and flagging tasks are mostly carried out by human employees, thus
making the process inefficient: semi-automating this pipeline would allow for a
significant speed-up and, consequently, better coverage of contents shared on
social media. The research area that deals with this kind of tasks is Sentiment
Analysis (SA): Sentiment Analysis is a sub-field of Natural Language Process-
ing (NLP) that, combining tools and techniques from linguistics and Computer
    Copyright c 2020 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0). This volume is published
    and copyrighted by its editors. SEBD 2020, June 21-24, 2020, Villasimius, Italy.
Science, aims at systematically identifying, extracting, and studying emotional
states and personal opinion in natural language. However, how can an algorithm
know if a text is expressing positive or negative sentiment? This tricky question
is not easy to answer, especially in the very complex and ambiguous domain of
feelings in which also humans sometimes struggle. Machine Learning algorithms
can be applied to Natural Language Processing. Still, the resulting models can
be very complicated, becoming black-boxes that provide no information about
how the sentiment classification task is performed. How can we trust the results
of a black box? Trusting the model is necessary, especially if there is a need to
deploy it on a large scale. eXplainable AI (XAI) is a recent research field that
deals with this kind of issue.

    Our Research Question is, therefore, the following: is it possible to equip SA
algorithms with XAI techniques, in such a way that sentiment labels are ex-
plained, and in a computationally feasible way?

   We start by applying state-of-the-art explainability methods to the field of
Sentiment Analysis. Then we explore an attention-based method capable of ex-
tracting explanations which are both similar to the black-box predictions and
computed in a small amount of time. We evaluate our methods and other XAI
techniques by comparing them with the original black-box model predictions and
show examples where explanations are in contrast with black-box predictions.


2   Related Work

Recently, the use of neural networks in the development of Natural Language
Processing (NLP) tasks has become very popular [1]. The standard approach is
to transform the input words into semantic vectors called Word Embeddings.
These vectors can then be used as input for other algorithms: in our case, they
will be fed into a Sentiment Analysis classifier.
    Currently, the most effective techniques to create Word Embeddings are the
Transformers models, which rely on the attention mechanism. The attention
mechanism allows looking over all the information the original sentence holds
and then create the proper word embedding according to the context. Trans-
formers models incorporate this by encoding each word position, so it is possible
to link two very distant words [2]. The Transformer model utilised in this work
is BERT [3], which is one of the most popular models in NLP. The suggested
approach to make Sentiment Classification is by applying Transfer Learning. As
indicated by BERT authors, the learning procedure is divided into two parts:
Pre-Training and Fine-Tuning. The Pre-Training phase is an unsupervised learn-
ing process. It consists of showing the model a large sentence corpus, masking
a random word, and trying to predict the same word embedding of the masked
input (Generative Pre-Training [4]). The second part is called the Fine-Tuning
phase. It consists of stacking, after BERT, a linear layer that acts as a classifier
and training the whole resulting model using a Sentiment Analysis dataset.
    Explaining text classification might look like an easy task for a human, but
not for a machine. The best performing models in Text Classification are deep
neural networks composed of billions of parameters, and explaining such complex
models is difficult or computationally expensive. Radford et a.l. [5] proposed an
original approach to this problem. While training their linear model with L1
regularisation, they noticed it used surprisingly few of the learned units. Further
analysis revealed a single “sentiment neuron” that was highly predictive of the
sentiment value. Using the output of this neuron, they can create scores that
explain each word’s sentiment in a sentence. In general, there are several types
of methods to explain a text classification [6]. For our case, the most suitable is
the feature importance method.
    Sentiment Analysis often deals with binary sentiment classification: a negative
sentence is labelled as 0, and a positive one as 1. The sentiment prediction task
provides a binary label to a sentence. The XAI method outputs a heatmap
visualising the contribution of each word of the prediction, as shown in Figure 1.


                 Fig. 1: Example of a heatmap applied on text.
    As a method for creating such an explanation, LIME (Local Interpretable
Agnostic-Model Explanations) [7] relies on a straightforward intuition: the model
may be very complex globally, but it is easier to approximate it around the
vicinity of a particular instance. While treating the model as a black-box, LIME
perturbs the instance and trains a local linear classifier. The weights of the linear
interpretable classifier create the heatmap.
    Integrated Gradients [8] is another technique that has a different approach
to the problem. Formally, suppose we have a function F : Rn → [0, 1] that
represents a deep network. Specifically, let x be the input at hand, and x0 be the
baseline input. For image networks, the baseline could be the black image, while
for text models, it could be the zero embedding vector. IntGrad considers the
straight-line path from the baseline x0 to the input x and computes the gradients
at all points along the path. Integrated gradients are obtained by cumulating
these gradients. Specifically, integrated gradients are defined as the path integral
of the gradients along the straight-line path from the baseline x0 to the input x.


3   Methodology

In this section, we introduce our approach to extract explanation scores from
a sentiment analysis classifier in a computationally feasible way. The approach
used here to classify documents is based on a pre-trained BERT model. Since
BERT uses attention scores, we exploit these scores for explainability purposes.
    Two parts compose the BERT model: the embedding creation part and the
classifier. The first one creates a vector representation of text while the latter
is built on top of the first and perform classical classification. Since BERT is
an attention-based model, we decide to add an attention layer between the two
parts to have better insight on the model decision, as shown in Figure 2.


Fig. 2: Adjustment applied to the BERT model. We inserted an attention layer
between BERT (encoder) and the Classifier.


    Through this attention layer, the model assigns the importance of each word
for the prediction task by weighing them when constructing the representation of
the text. For instance, a word such as ‘amazing’ is likely to be very informative
of the emotional meaning of a text, and it should thus be treated accordingly.
We use a simple approach inspired by Bahdanau [9] and Yang [10] with a single
parameter per input channel:


                                                                   T
                          exp(ht wa )                              X
             (1)   at = PT                             (2)   v=          at ht
                         i=1 exp(hi wa )                           t=1


BERT outputs an embedding vector ht for every word t of a sentence of T words.
The attention layer learns importance scores, at , for each word by multiplying
the representations ht with a weight vector wa , learned during the process. The
output is normalised using a softmax function to construct a probability distri-
bution over the words. Lastly, the output representation vector for the text, v,
is computed with a weighted summation over all the words using the attention
importance scores as weights. This representation vector obtained from the at-
tention layer is a high-level encoding of the entire text, and it is used as input to
the classifier: a feed-forward layer with a sigmoid activation, σ(Wclassif ier · v).
    The learned attention scores at are the output of a softmax, so they are all
positive and do not incorporate the signal from the classifier. To overcome this,
we multiply the attention scores obtained from the attention layer by the weights
of the classifier. We take the sign and multiply it by the scores:

                        ât = at ∗ sign(Wclassif ier · (at ht ))

This is our definition of explanation scores.
For the learning phase, we freeze the BERT weights as the original pre-trained
model and optimise only the attention and the classification layer.
4   Experiments


Fig. 3: Rotten Tomatoes Movie Review Dataset Examples, positive scores are
expressed with a fresh tomato, while the negative with a rotten one.

Dataset. For our experiments we used, as data, movie reviews; in particular
the Stanford Sentiment Treebank (SST) [11] which contains 11855 sentences in
movie reviews. Movie reviews are an excellent source for our task, since users
both provide text reviews and a score, thus creating a labelled dataset.
    The movie reviews have initially been posted on http://rottentomatoes.com/,
a popular movie fan website where users can comment and express a posi-
tive/negative sentiment, using a fresh tomato         or a rotten tomato    (see
Figure 3). We split the original dataset of 11855 sentences in a training set
(6920), a validation set (872) and a test set (1821).

Training. First, we trained our modified version of BERT as a sentiment clas-
sifier for the SST dataset mentioned before; we adopted the hyperparameters as
described in the original paper [3].
Second, we used the trained BERT as a black-box classifier and labelled all re-
views in the test set; we then computed the explanation scores for all labels by
running LIME, IntGrad and Attention.
Third, we compared these explanation scores with the ground truth labels and
the predictions on the test set of the original black-box model. To do this we
have to produce a prediction from the explanation scores, so we built a simple
classifier which computes the label y of a sentence j taking
                                                         P the sign
                                                                   of the sum
                                                           N
of the score s of the words t of the sentence (yj = sign   t=0 stj ).


Validation. To compare these three explanation scores, we used the Fidelity [6],
to capture how much each XAI technique can mimic the behaviour of the black-
box model it is explaining. We also measured the similarity of our explanation
scores with the ground truth labels. Both our evaluation criteria were measured
using the ROC AUC scores. The results are shown in Figure 4.
    IntGrad is the XAI method that best replicates the model prediction and
the one that works better on the test set. This technique seems the best choice,
                      1.0                                                                                         1.0


 True Positive Rate   0.8                                                                                         0.8


                                                                                             True Positive Rate
                      0.6                                                                                         0.6


                      0.4                                                                                         0.4


                      0.2                                                                                         0.2                     ROC curve model (AUC = 0.98)
                                             ROC curve IG (AUC = 1.00)                                                                    ROC curve Attention (AUC = 0.95)
                                             ROC curve Attention (AUC = 0.97)                                                             ROC curve LIME (AUC = 0.92)
                                             ROC curve LIME (AUC = 0.95)                                                                  ROC curve IG (AUC = 0.95)
                      0.0                                                                                         0.0
                                                                                                                     0.0         0.2       0.4         0.6      0.8          1.0
                            0.0      0.2      0.4        0.6      0.8       1.0                                                          False Positive Rate
                                            False Positive Rate
                                                                                             (b) Explanation scores and black-box
 (a) Explanation scores versus black-
                                                                                             prediction versus ground truth labels.
 box predictions.

                                           Fig. 4: Comparison between explainable methods.

but it still suffers from high computational costs. We evaluated the time for each
method to be performed on the entirety of the test set. For the attention layer,
we only needed two minutes. For IntGrad, we had to call the black-box model
for every step from the baseline, so the time raises to 102 minutes. LIME is the
method that performs the worst in terms of time for a total of 1440 minutes. For
every sentence, we produce a synthetic neighbourhood, then call the black-box
model to label it and finally learn an interpretable local classifier. Our method
is less accurate but much faster, as the attention layer is part of the black-box
model and does not need any additional model call.
To better appreciate the difference and similarities between the XAI methods,
we show in Figure 5 an example of score assignment. The sentence is taken from
the test set, and it has negative sentiment as ground truth:“We never really feel
involved with the story, as all of its ideas remain just that: abstract ideas”.

                             we never really feel involved with the story   ,     as   all            of                its ideas remain just that     : abstractideas   .

 0.0

 0.2

 0.4                                                                                                                                                                         IntGrad
                                                                                                                                                                             Attention
                                                                                                                                                                             LIME


Fig. 5: Comparison between attention score â (orange), LIME Scores (blue),
and Integrated Gradients scores (green) for the sentence: “We never really feel
involved with the story, as all of its ideas remain just that: abstract ideas.”

   Sometimes, these XAI methods conflict with each other. In Figure 6, we
show the box-plot of the distribution of correlation coefficients between all the
scores along the test dataset used. We can see that overall the scores are strongly
                                                                                         this   is    pretty   dicey        material   .
                                                                                  0.2
                                  1.00                                            0.0
                                  0.75                                            0.2
Pearson Correlation Coefficient


                                  0.50                                            0.4                                                          IntGrad
                                                                                                                                               Attention
                                  0.25                                                                                                         LIME
                                                                                  0.6
                                  0.00
                                                                                         not    a    bad journey       at        all       .
                                  0.25
                                  0.50
                                                                                  0.50
                                  0.75
                                                                                  0.25
                                  1.00                                            0.00
                                                                                                                                               IntGrad
                                          IG vs ATT   LIME vs ATT   LIME vs IG    0.25                                                         Attention
                                                                                                                                               LIME

Fig. 6: BoxPlot of the Pear-
son Correlation Coefficients                                                     Fig. 7: Comparison between modified atten-
between XAI methods.                                                             tion score â and LIME Scores for the sen-
                                                                                 tences: “This is pretty dicey material.” and
                                                                                 “Not a bad journey at all.”


correlated with each other, but we have a non-negligible number of samples that
are negatively correlated. Further analyses highlight how, in some cases, LIME,
IntGrad, and attention are negatively correlated in different ways. In Figure 7,
we can see two different examples of this uncorrelated behaviour. The top chart
shows the negative sentence ”This is pretty dicey material”, and we can see that
neither LIME nor IntGrad could capture the particular use of the word dicey. In
contrast, the bottom one shows the positive sentence ”Not a bad journey at all”,
and in this case, IntGrad fails at capturing the use of the adjective bad. In these
examples, explanation scores conflict with the model predictions. The model-
explanation concordance is, in general, not guaranteed and has to be taken into
account when developing and evaluating XAI techniques.


5                                        Conclusions
In this work, we show how to use attention layers to extract explanation scores
about model predictions. Our method can provide explanations matching the
predictions of the black-box model but requires a much lower computational
time than the state-of-the-art benchmark methods of LIME and IntGrad. We
found that attention scores can be used to explore the internal behaviour of
deep neural network models and have XAI capabilities comparable to other ap-
proaches, requiring less computational resources. Choosing this approach is a
matter of trade-off between performance and time. Different datasets, classifica-
tion tasks, and black-box NLP models are to be considered in order to explore
this trade-off further.
    We conclude that many XAI techniques can be applied to the field of NLP
to understand better the sentiment classification process (and other NLP tasks
in general). However, there is much room for improvement: XAI techniques can
be an enabling factor for the explanation and deployment of ML models, but the
current state of the art has not yet reached the desired maturity to be applied
at scale.

Acknowledgements
AP and AP acknowledge partial support from Research Project Casa Nel Parco
(POR FESR 14/20 - CANP - Cod. 320 - 16 - Piattaforma Tecnologica Salute e
Benessere) funded by Regione Piemonte in the context of the Regional Platform
on Health and Wellbeing and from Intesa Sanpaolo Innovation Center. The fun-
ders had no role in study design, data collection and analysis, decision to publish,
or preparation of the manuscript.

References
 1. Richard Socher, Yoshua Bengio, and Christopher D Manning. Deep learning for
    NLP (without magic). In Tutorial Abstracts of ACL 2012, pages 5–5.
 2. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones,
    Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need.
    In Advances in neural information processing systems, pages 5998–6008, 2017.
 3. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-
    training of deep bidirectional transformers for language understanding. arXiv
    preprint arXiv:1810.04805, 2018.
 4. Alec Radford, Karthik Narasimhan, Time Salimans, and Ilya Sutskever. Improving
    language understanding with unsupervised learning. Tech.Rep., OpenAI, 2018.
 5. Alec Radford, Rafal Jozefowicz, and Ilya Sutskever. Learning to generate reviews
    and discovering sentiment. arXiv preprint arXiv:1704.01444, 2017.
 6. Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Gian-
    notti, and Dino Pedreschi. A survey of methods for explaining black box models.
    ACM computing surveys (CSUR), 51(5):1–42, 2018.
 7. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. “Why should i trust
    you?” Explaining the predictions of any classifier. In Proc. of the 22nd ACM
    SIGKDD Int.l Conf. on Knowledge Discovery and Data Mining, pages 1135–1144,
    2016.
 8. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep
    networks. In Proc. of the 34th Int.l Conf. on Machine Learning, volume 70, pages
    3319–3328, 2017.
 9. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine trans-
    lation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473,
    2014.
10. Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy.
    Hierarchical attention networks for document classification. In Proc. of the 2016
    Conf. of the North American chapter of the ACL: Human Language Technologies,
    pages 1480–1489, 2016.
11. Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D Manning,
    Andrew Y Ng, and Christopher Potts. Recursive deep models for semantic com-
    positionality over a sentiment treebank. In Proc. of the 2013 Conf. on Empirical
    Methods in Natural Language Processing, pages 1631–1642, 2013.

</pre>