<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Explainability Methods for Natural Language Processing: Applications to Sentiment Analysis (Discussion Paper)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Francesco Bodria</string-name>
          <email>francesco.bodria@sns.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andre Panisson</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alan Perotti</string-name>
          <email>alan.perottig@isi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simone Piaggesi</string-name>
          <email>simone.piaggesi2@unibo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>ISI Foundation</institution>
          ,
          <addr-line>Turin</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Scuola Normale Superiore</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universita di Bologna</institution>
          ,
          <addr-line>Bologna</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Sentiment analysis is the process of classifying natural language sentences as expressing positive or negative sentiments, and it is a crucial task where the explanation of a prediction might arguably be as necessary as the prediction itself. We analysed di erent explanation techniques, and we applied them to the classi cation task of Sentiment Analysis. We explored how attention-based techniques can be exploited to extract meaningful sentiment scores with a lower computational cost than existing XAI methods.</p>
      </abstract>
      <kwd-group>
        <kwd>eXplainable Arti cial Intelligence</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Sentiment Analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The World Wide Web is a great invention, as it connects everyone in the world,
but simultaneously, it is a double-edged sword. In such an open-by-design
architecture, everyone can express their feelings from joy to anger. In Social
Networks, negative sentiments can derail into hate speech; that can be shared easily,
quickly and anonymously, thus becoming a problem. To contain the generation
and di usion of such undesired content, social network companies had to deploy
special teams that regularly watch the net and block this phenomenon. These
monitoring and agging tasks are mostly carried out by human employees, thus
making the process ine cient: semi-automating this pipeline would allow for a
signi cant speed-up and, consequently, better coverage of contents shared on
social media. The research area that deals with this kind of tasks is Sentiment
Analysis (SA): Sentiment Analysis is a sub- eld of Natural Language
Processing (NLP) that, combining tools and techniques from linguistics and Computer
Copyright c 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0). This volume is published
and copyrighted by its editors. SEBD 2020, June 21-24, 2020, Villasimius, Italy.
Science, aims at systematically identifying, extracting, and studying emotional
states and personal opinion in natural language. However, how can an algorithm
know if a text is expressing positive or negative sentiment? This tricky question
is not easy to answer, especially in the very complex and ambiguous domain of
feelings in which also humans sometimes struggle. Machine Learning algorithms
can be applied to Natural Language Processing. Still, the resulting models can
be very complicated, becoming black-boxes that provide no information about
how the sentiment classi cation task is performed. How can we trust the results
of a black box? Trusting the model is necessary, especially if there is a need to
deploy it on a large scale. eXplainable AI (XAI) is a recent research eld that
deals with this kind of issue.</p>
      <p>Our Research Question is, therefore, the following: is it possible to equip SA
algorithms with XAI techniques, in such a way that sentiment labels are
explained, and in a computationally feasible way?</p>
      <p>We start by applying state-of-the-art explainability methods to the eld of
Sentiment Analysis. Then we explore an attention-based method capable of
extracting explanations which are both similar to the black-box predictions and
computed in a small amount of time. We evaluate our methods and other XAI
techniques by comparing them with the original black-box model predictions and
show examples where explanations are in contrast with black-box predictions.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Recently, the use of neural networks in the development of Natural Language
Processing (NLP) tasks has become very popular [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The standard approach is
to transform the input words into semantic vectors called Word Embeddings.
These vectors can then be used as input for other algorithms: in our case, they
will be fed into a Sentiment Analysis classi er.
      </p>
      <p>
        Currently, the most e ective techniques to create Word Embeddings are the
Transformers models, which rely on the attention mechanism. The attention
mechanism allows looking over all the information the original sentence holds
and then create the proper word embedding according to the context.
Transformers models incorporate this by encoding each word position, so it is possible
to link two very distant words [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The Transformer model utilised in this work
is BERT [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], which is one of the most popular models in NLP. The suggested
approach to make Sentiment Classi cation is by applying Transfer Learning. As
indicated by BERT authors, the learning procedure is divided into two parts:
Pre-Training and Fine-Tuning. The Pre-Training phase is an unsupervised
learning process. It consists of showing the model a large sentence corpus, masking
a random word, and trying to predict the same word embedding of the masked
input (Generative Pre-Training [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]). The second part is called the Fine-Tuning
phase. It consists of stacking, after BERT, a linear layer that acts as a classi er
and training the whole resulting model using a Sentiment Analysis dataset.
      </p>
      <p>
        Explaining text classi cation might look like an easy task for a human, but
not for a machine. The best performing models in Text Classi cation are deep
neural networks composed of billions of parameters, and explaining such complex
models is di cult or computationally expensive. Radford et a.l. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposed an
original approach to this problem. While training their linear model with L1
regularisation, they noticed it used surprisingly few of the learned units. Further
analysis revealed a single \sentiment neuron" that was highly predictive of the
sentiment value. Using the output of this neuron, they can create scores that
explain each word's sentiment in a sentence. In general, there are several types
of methods to explain a text classi cation [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. For our case, the most suitable is
the feature importance method.
      </p>
      <p>Sentiment Analysis often deals with binary sentiment classi cation: a negative
sentence is labelled as 0, and a positive one as 1. The sentiment prediction task
provides a binary label to a sentence. The XAI method outputs a heatmap
visualising the contribution of each word of the prediction, as shown in Figure 1.</p>
      <p>
        As a method for creating such an explanation, LIME (Local Interpretable
Agnostic-Model Explanations) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] relies on a straightforward intuition: the model
may be very complex globally, but it is easier to approximate it around the
vicinity of a particular instance. While treating the model as a black-box, LIME
perturbs the instance and trains a local linear classi er. The weights of the linear
interpretable classi er create the heatmap.
      </p>
      <p>
        Integrated Gradients [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is another technique that has a di erent approach
to the problem. Formally, suppose we have a function F : Rn ! [0; 1] that
represents a deep network. Speci cally, let x be the input at hand, and x0 be the
baseline input. For image networks, the baseline could be the black image, while
for text models, it could be the zero embedding vector. IntGrad considers the
straight-line path from the baseline x0 to the input x and computes the gradients
at all points along the path. Integrated gradients are obtained by cumulating
these gradients. Speci cally, integrated gradients are de ned as the path integral
of the gradients along the straight-line path from the baseline x0 to the input x.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Methodology</title>
      <p>In this section, we introduce our approach to extract explanation scores from
a sentiment analysis classi er in a computationally feasible way. The approach
used here to classify documents is based on a pre-trained BERT model. Since
BERT uses attention scores, we exploit these scores for explainability purposes.</p>
      <p>Two parts compose the BERT model: the embedding creation part and the
classi er. The rst one creates a vector representation of text while the latter
is built on top of the rst and perform classical classi cation. Since BERT is
an attention-based model, we decide to add an attention layer between the two
parts to have better insight on the model decision, as shown in Figure 2.</p>
      <p>
        Through this attention layer, the model assigns the importance of each word
for the prediction task by weighing them when constructing the representation of
the text. For instance, a word such as `amazing' is likely to be very informative
of the emotional meaning of a text, and it should thus be treated accordingly.
We use a simple approach inspired by Bahdanau [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and Yang [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] with a single
parameter per input channel:
(1) at =
      </p>
      <p>exp(htwa)
PT
i=1 exp(hiwa)
(2) v =</p>
      <p>T
X atht
t=1
BERT outputs an embedding vector ht for every word t of a sentence of T words.
The attention layer learns importance scores, at, for each word by multiplying
the representations ht with a weight vector wa, learned during the process. The
output is normalised using a softmax function to construct a probability
distribution over the words. Lastly, the output representation vector for the text, v,
is computed with a weighted summation over all the words using the attention
importance scores as weights. This representation vector obtained from the
attention layer is a high-level encoding of the entire text, and it is used as input to
the classi er: a feed-forward layer with a sigmoid activation, (Wclassifier v).</p>
      <p>The learned attention scores at are the output of a softmax, so they are all
positive and do not incorporate the signal from the classi er. To overcome this,
we multiply the attention scores obtained from the attention layer by the weights
of the classi er. We take the sign and multiply it by the scores:</p>
      <p>a^t = at sign(Wclassifier (atht))
This is our de nition of explanation scores.</p>
      <p>For the learning phase, we freeze the BERT weights as the original pre-trained
model and optimise only the attention and the classi cation layer.</p>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>
        Dataset. For our experiments we used, as data, movie reviews; in particular
the Stanford Sentiment Treebank (SST) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] which contains 11855 sentences in
movie reviews. Movie reviews are an excellent source for our task, since users
both provide text reviews and a score, thus creating a labelled dataset.
      </p>
      <p>The movie reviews have initially been posted on http://rottentomatoes.com/,
a popular movie fan website where users can comment and express a
positive/negative sentiment, using a fresh tomato or a rotten tomato (see
Figure 3). We split the original dataset of 11855 sentences in a training set
(6920), a validation set (872) and a test set (1821).</p>
      <p>
        Training. First, we trained our modi ed version of BERT as a sentiment
classi er for the SST dataset mentioned before; we adopted the hyperparameters as
described in the original paper [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>Second, we used the trained BERT as a black-box classi er and labelled all
reviews in the test set; we then computed the explanation scores for all labels by
running LIME, IntGrad and Attention.</p>
      <p>
        Third, we compared these explanation scores with the ground truth labels and
the predictions on the test set of the original black-box model. To do this we
have to produce a prediction from the explanation scores, so we built a simple
classi er which computes the label y of a sentence j taking the sign of the sum
of the score s of the words t of the sentence (yj = sign PtN=0 stj ).
Validation. To compare these three explanation scores, we used the Fidelity [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
to capture how much each XAI technique can mimic the behaviour of the
blackbox model it is explaining. We also measured the similarity of our explanation
scores with the ground truth labels. Both our evaluation criteria were measured
using the ROC AUC scores. The results are shown in Figure 4.
      </p>
      <p>IntGrad is the XAI method that best replicates the model prediction and
the one that works better on the test set. This technique seems the best choice,
0.2
0.0
but it still su ers from high computational costs. We evaluated the time for each
method to be performed on the entirety of the test set. For the attention layer,
we only needed two minutes. For IntGrad, we had to call the black-box model
for every step from the baseline, so the time raises to 102 minutes. LIME is the
method that performs the worst in terms of time for a total of 1440 minutes. For
every sentence, we produce a synthetic neighbourhood, then call the black-box
model to label it and nally learn an interpretable local classi er. Our method
is less accurate but much faster, as the attention layer is part of the black-box
model and does not need any additional model call.</p>
      <p>To better appreciate the di erence and similarities between the XAI methods,
we show in Figure 5 an example of score assignment. The sentence is taken from
the test set, and it has negative sentiment as ground truth:\We never really feel
involved with the story, as all of its ideas remain just that: abstract ideas".
we never really feel involvedwith the story , as all of its ideas remain just that : abstractideas .
(b) Explanation scores and black-box
prediction versus ground truth labels.</p>
      <p>IntGrad
Attention
LIME</p>
      <p>Sometimes, these XAI methods con ict with each other. In Figure 6, we
show the box-plot of the distribution of correlation coe cients between all the
scores along the test dataset used. We can see that overall the scores are strongly
not
a
bad journey at
all
.</p>
      <p>IntGrad
Attention
LIME
IntGrad
Attention
LIME
Fig. 7: Comparison between modi ed
attention score a^ and LIME Scores for the
sentences: \This is pretty dicey material." and
\Not a bad journey at all."
correlated with each other, but we have a non-negligible number of samples that
are negatively correlated. Further analyses highlight how, in some cases, LIME,
IntGrad, and attention are negatively correlated in di erent ways. In Figure 7,
we can see two di erent examples of this uncorrelated behaviour. The top chart
shows the negative sentence "This is pretty dicey material", and we can see that
neither LIME nor IntGrad could capture the particular use of the word dicey. In
contrast, the bottom one shows the positive sentence "Not a bad journey at all",
and in this case, IntGrad fails at capturing the use of the adjective bad. In these
examples, explanation scores con ict with the model predictions. The
modelexplanation concordance is, in general, not guaranteed and has to be taken into
account when developing and evaluating XAI techniques.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>In this work, we show how to use attention layers to extract explanation scores
about model predictions. Our method can provide explanations matching the
predictions of the black-box model but requires a much lower computational
time than the state-of-the-art benchmark methods of LIME and IntGrad. We
found that attention scores can be used to explore the internal behaviour of
deep neural network models and have XAI capabilities comparable to other
approaches, requiring less computational resources. Choosing this approach is a
matter of trade-o between performance and time. Di erent datasets, classi
cation tasks, and black-box NLP models are to be considered in order to explore
this trade-o further.</p>
      <p>We conclude that many XAI techniques can be applied to the eld of NLP
to understand better the sentiment classi cation process (and other NLP tasks
in general). However, there is much room for improvement: XAI techniques can
be an enabling factor for the explanation and deployment of ML models, but the
current state of the art has not yet reached the desired maturity to be applied
at scale.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>AP and AP acknowledge partial support from Research Project Casa Nel Parco
(POR FESR 14/20 - CANP - Cod. 320 - 16 - Piattaforma Tecnologica Salute e
Benessere) funded by Regione Piemonte in the context of the Regional Platform
on Health and Wellbeing and from Intesa Sanpaolo Innovation Center. The
funders had no role in study design, data collection and analysis, decision to publish,
or preparation of the manuscript.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Richard Socher, Yoshua Bengio, and
          <string-name>
            <given-names>Christopher D</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <article-title>Deep learning for NLP (without magic)</article-title>
          .
          <source>In Tutorial Abstracts of ACL</source>
          <year>2012</year>
          , pages
          <issue>5</issue>
          {
          <fpage>5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Ashish</given-names>
            <surname>Vaswani</surname>
          </string-name>
          , Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,
          <string-name>
            <surname>Lukasz Kaiser</surname>
            , and
            <given-names>Illia</given-names>
          </string-name>
          <string-name>
            <surname>Polosukhin</surname>
          </string-name>
          .
          <article-title>Attention is all you need</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <volume>5998</volume>
          {
          <fpage>6008</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <article-title>Bert: Pretraining of deep bidirectional transformers for language understanding</article-title>
          .
          <source>arXiv preprint arXiv:1810.04805</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Alec</given-names>
            <surname>Radford</surname>
          </string-name>
          , Karthik Narasimhan, Time Salimans, and
          <string-name>
            <given-names>Ilya</given-names>
            <surname>Sutskever</surname>
          </string-name>
          .
          <article-title>Improving language understanding with unsupervised learning</article-title>
          .
          <source>Tech.Rep., OpenAI</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Alec</given-names>
            <surname>Radford</surname>
          </string-name>
          , Rafal Jozefowicz, and
          <string-name>
            <given-names>Ilya</given-names>
            <surname>Sutskever</surname>
          </string-name>
          .
          <article-title>Learning to generate reviews and discovering sentiment</article-title>
          .
          <source>arXiv preprint arXiv:1704.01444</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Riccardo</given-names>
            <surname>Guidotti</surname>
          </string-name>
          , Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and
          <string-name>
            <given-names>Dino</given-names>
            <surname>Pedreschi</surname>
          </string-name>
          .
          <article-title>A survey of methods for explaining black box models</article-title>
          .
          <source>ACM computing surveys (CSUR)</source>
          ,
          <volume>51</volume>
          (
          <issue>5</issue>
          ):1{
          <fpage>42</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Marco</given-names>
            <surname>Tulio</surname>
          </string-name>
          <string-name>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Sameer</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Carlos</given-names>
            <surname>Guestrin</surname>
          </string-name>
          . \
          <article-title>Why should i trust you?" Explaining the predictions of any classi er</article-title>
          .
          <source>In Proc. of the 22nd ACM SIGKDD Int.l Conf. on Knowledge Discovery and Data Mining</source>
          , pages
          <volume>1135</volume>
          {
          <fpage>1144</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Mukund</given-names>
            <surname>Sundararajan</surname>
          </string-name>
          , Ankur Taly, and
          <string-name>
            <given-names>Qiqi</given-names>
            <surname>Yan</surname>
          </string-name>
          .
          <article-title>Axiomatic attribution for deep networks</article-title>
          .
          <source>In Proc. of the 34th Int.l Conf. on Machine Learning</source>
          , volume
          <volume>70</volume>
          , pages
          <fpage>3319</fpage>
          {
          <fpage>3328</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Dzmitry</given-names>
            <surname>Bahdanau</surname>
          </string-name>
          , Kyunghyun Cho, and
          <string-name>
            <surname>Yoshua Bengio.</surname>
          </string-name>
          <article-title>Neural machine translation by jointly learning to align and translate</article-title>
          .
          <source>arXiv preprint arXiv:1409.0473</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Zichao</surname>
            <given-names>Yang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Diyi</given-names>
            <surname>Yang</surname>
          </string-name>
          , Chris Dyer, Xiaodong He,
          <string-name>
            <surname>Alex Smola</surname>
            , and
            <given-names>Eduard</given-names>
          </string-name>
          <string-name>
            <surname>Hovy</surname>
          </string-name>
          .
          <article-title>Hierarchical attention networks for document classi cation</article-title>
          .
          <source>In Proc. of the 2016 Conf. of the North American chapter of the ACL: Human Language Technologies</source>
          , pages
          <volume>1480</volume>
          {
          <fpage>1489</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang,
          <string-name>
            <surname>Christopher D Manning</surname>
            , Andrew Y Ng, and
            <given-names>Christopher</given-names>
          </string-name>
          <string-name>
            <surname>Potts</surname>
          </string-name>
          .
          <article-title>Recursive deep models for semantic compositionality over a sentiment treebank</article-title>
          .
          <source>In Proc. of the 2013 Conf. on Empirical Methods in Natural Language Processing</source>
          , pages
          <volume>1631</volume>
          {
          <fpage>1642</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>