Incorporating Context and Knowledge for Better
               Sentiment Analysis of Narrative Text

                  Chenyang Lyu                                   Tianbo Ji
    School of Computing, Dublin City University     ADAPT Centre, Dublin City University
                  Dublin, Ireland                             Dublin, Ireland
             chenyang.lyu2@mail.dcu.ie                    tianbo.ji2@mail.dcu.ie
                                       Yvette Graham
                            ADAPT Centre, Dublin City University
                                       Dublin, Ireland
                                    yvette.graham@dcu.ie


                                                         Abstract
                       Recent years have witnessed a significant increase in the availability of
                       online narrative texts, such as news articles and online stories. Auto-
                       matic sentiment classification of such narrative texts will enable better
                       organisation, search and retrieval. Little research has been carried out
                       to date that to automatically predict sentiment of narrative text how-
                       ever despite much work being carried out to date on sentiment analysis
                       of social media and user feedback. In this paper, we present an ap-
                       proach to sentiment analysis of narrative text that employs pre-trained
                       language model, an approach already proven e↵ective for a range of
                       other NLP tasks. However, for the purpose of sentiment analysis of nar-
                       rative text in particular, we introduce two new features: a contextual
                       feature and extra knowledge feature that prove to aid text understand-
                       ing for prediction of sentiment. We conduct preliminary experiments
                       on a publicly available dataset of fairy tale texts. Results show an in-
                       crease of accuracy over vanilla pre-trained language model and baseline
                       models for classification of sentiment for narrative texts.

1    Introduction
Sentiment analysis refers to the analysis and prediction of the sentiment and emotion in language. Most research
carried out to date in this domain has been based on user-generated content such as social media and user
feedback [VC12]. Recently, with the development of online narrative resources such as web-fiction and news, the
interest in sentiment analysis of narrative text is slowly expanding [Moh13] in order to make better organisation
and retrieval. The sentiment and emotion of narrative text can characterised defined as a general feeling reflected
in the text and in this current work we use sentiment to stand for both sentiment and emotion.
   Contrary to other types of text, narrative is a perceived sequence of non-randomly connected events [Too12].
For example, A countryman’s son stepped on a snake’s tail accidentally. The tail suddenly turned and hit him
so that he died. The father was very angry so that he cut o↵ part of the snake’s tail. is a piece of narrative
text1 composed of connected events. Due to its complexity, analysing the sentiment of narrative text requires
Copyright c by the paper’s authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
In: R. Campos, A. Jorge, A. Jatowt, S. Bhatia (eds.): Proceedings of the Text2Story’20 Workshop, Lisbon, Portugal, 14-April-2020,
published at http://ceur-ws.org
   1 http://englishjuniorhighschool.blogspot.com/2012/04/countryman-and-snake- countrymans-son.html


                                                             39
a deep understanding of semantic structure and opinions. Since the events that impact the plot of narrative
texts are commonly interrelated, this increases the importance of sufficiently taking context into consideration.
Furthermore, there are several linguistic phenomena, such as metaphor in sentence The lack of having proper
standing army is Achilles heel for this small country, and terminology requiring additional knowledge in order to
be correctly interpreted. Incorporating such additional knowledge has the potential to help predict the sentiment
of text.
   Nowadays, pre-trained language models, such as those in [DCLT19], [PNI+ 18] and [RNSS], have proven
e↵ective for a range of tasks including syntactic and semantic tasks, like textual similarity, sentiment classification,
textual entailment, etc. These models are usually trained on large scale corpora and have the ability to capture
and model semantic dependencies over long text. Such pre-trained models have the potential to detect and
classify the sentiment of narrative text.
   In this paper, we propose a model that employs a state-of-the-art pre-trained language model to classify the
sentiment class of narrative text. This model takes context into consideration as well as additional knowledge
as features to predict the sentiment class. Firstly, what we call context features are used to generate a context-
aware representation of a given sentence, before it is used to determine its sentiment class. We provide a detailed
description of the methodology and experiments investigating the accuracy of our approach in Sections 3 and 4,
respectively.

2   Related Work
Sentiment Analysis is regarded as a text classification or text mining problem, and there are two main approaches
to this problem: a rule-based approach and a learning-based approach. At first in sentiment analysis, a rule-
based approach was attempted involving pre-defined lexicon with corresponding sentiment polarity to predict the
polarity of a given text by capturing word occurrences [HS00, DC07, WWB+ 04]. Besides lexicon methods, Zhang
et al [ZZL+ 09] also proposed a document-level sentiment analysis framework, which determines sentiment polarity
of sentences based on word dependency before aggregating sentences for document-level sentiment prediction, in
which single sentence polarity is computed independent of context.
   However, sentiment of text can be expressed in a more subtle manner than is possible to detect with such
approaches, since they are difficult classify only based on indicative lexical features. Machine learning algorithms
have the potential to address the short-comings of rule-based approaches. However, learning-based approaches
require a labeled training set to be trained on and then predict new samples. Pang et al[PLV02] proposed three
machine learning models, Naı̈ve Bayes, Support Vector Machine and Maximum Entropy to predict sentiment
based on bag-of-word feature. [NIK10] proposed a dependency tree-based model to capture interactions between
words, while [VCB14] and [KF16] apply a range of kernel methods.
   Sentiment analysis with neural networks has a long history. For example, Socher et al [SPW+ 13] proposed
Recursive Neural Tensor Networks(RNTNs) to model sentence-level sentiment recursively from its syntactic tree.
[dSG14] on the other hand adopted Convolutional Neural Networks (CNNs) to classify sentiment for short text,
their model outperforming traditional machine learning models, such as Naı̈ve Bayes and SVM. Some alternate
approaches using Recurrent Neural Networks have also been proposed, such as [WHZ+ 16] and [ZWX16], the
latter employing attention-based LSTM models.
   Recently, large-scale pre-trained models have emerged and have been shown very e↵ective on various tasks
[PNI+ 18, RNSS, DCLT19], including reading comprehension and sentiment classification, etc. Pre-trained models
are trained on large unlabeled corpora in unsupervised setting, after pre-training they are able to capture rich
semantic patterns and syntactic information from text [JSS19]. For instance, the noun modifier(The former ) in
sentence The 45-year-old former General Electric CO. executive figures it will be easier this time. can be clearly
attended to its noun (executive) by self-attention mechanism in BERT[CKLM19]. Pre-trained model is quite
suitable for us to model sentiment of text, we therefore employ BERT(Bidirectional Encoder Representation
from Transformer) on our task because of its capability to model bidirectional dependencies among text.

3   Methodology
Most current state-of-the-art pre-trained language models are based on a neural architecture known as the
transformer[VSP+ 17], originally proposed by Vaswani et al.In this study, we use pre-trained model BERT
[WDS+ 19]. The BERT [DCLT19] we use has been pre-trained on a large unlabeled corpus containing over 3
billion words. This large volume of data enables it to model complex language dependencies in text and produce


                                                        40
very rich representations of text. The input is represented as document = {s1 , ......, sn }, si = {wi,1 , ......, wi,m },
where si is the i-th sentence in document.


Figure 1: Model Architecture for Sentence-Level Sentiment Classification Incorporating Contextual Feature and
Knowledge. The blue arrows stand for contextual vectors from previous sentences.


3.1   BERT-Bidirectional Encoder Representation from Transformer
We use BERT as encoder to generate sentiment label for a single sentence, which is based on a transformer block
containing one self-attention layer, two layer norms and one feed-forward layer. The transformer we employed
consists of 12 stacked transformer blocks, and the output of its last layer is used to get the sentiment label. The
overall process of computing sentiment label for a single sentence can be formulated as following:

                                            hl = bert encoder(w1 , w2 , ......, wk )                                         (1)


                                y = sof tmax(linear layer(pooling layer(hl ), context))                                      (2)
where hl is hidden state vectors from the last layer and it will be fed into pooling layer 2 and linear layer, then
passed into softmax layer with context vector to generate the sentiment label y, which will be introduced in
section 3.2. The overall architecture of our model is shown in Figure 1.

3.2   Contextual Feature
Usually, the sentiment label of a single sentence in a document is only conditioned on itself:

                                             sentiment label = argmax p(y|sj )                                               (3)
                                                                      y


which does not take advantage of the context. However, we argue that in narrative, the context of a given
sentence plays a crucial role in determining its semantic meaning as well as sentiment label.
   We propose to incorporate context features when we compute the sentiment label for a sentence by combining
the representation vector of context sentences of a certain sentence, in this way we propose to give sentiment
label as following:

                                       sentiment label = argmax p(y|sj , contextj )                                          (4)
                                                                  y


                                           contextj = (sj t , sj t+1 , ......, sj 1 )                                        (5)
  2 We use ”pooling layer” in accordance with huggingface and this layer simply selects the representation vector of token [CLS],

which is a special token added in front of every sentence.


                                                             41
where t is the number of sentences we incorporate in context,      is the function we use to combine sentence
vectors. In our framework, we use the linear combination of context sentences and the target sentence to
generate sentiment label. In this work, we use the last sentence of current sentence as it’s context, which has
the most significant e↵ect on the sentiment of current sentence.

3.3   Extra Knowledge
Besides context feature, we propose to use extra knowledge information to improve understanding of sentiment
class of text. In this paper, we incorporate entity information and idiom information into our model. We use
idioms dataset SLIDE from [JBBHS18], which includes over 5000 idioms with sentiment annotation. In SLIDE
dataset, every idiom has a sentiment distribution, which has three entries. Every entry scales from 0 to 1 and
the sum of three entries is 1. For our framework, the extra knowledge is represented by a sentiment-term matrix,
every entry of a certain term represents their inclination to corresponding sentiment label and all entries sum to
one. The sentiment-term matrix is defined as :
                                                               z
                                                               X
                                       Mi,j = p(yj |termi ),         Mi,j = 1                                  (6)
                                                               j=1

In our model, we incorporate extra knowledge as following:
                                                                                v
                                                                                X
                             p(y|sj , contextj ) = sof tmax(normalize(hj ) +          Mi )                     (7)
                                                                                i=1

Mi is the sentiment distribution vector of the term that is attended in given sentence, v is the number of all
attended terms. We first normalize the hidden state hl and then add normalized hl and all term vectors {Mi }vi=1
which a↵ect sentiment of text. In our normalize function, the hidden state hl is normalized and scaled by a
factor , which makes the ratio of l1 norms of hl and Mi to . In our experiment, we set to 10.

4     Experiments and Results
4.1   Data
For a preliminary validation of our methodology, we evaluate our proposed model with di↵erent features on
dataset in[Alm08]. This dataset includes 176 fairy tales from three sets of children’s stories: Beatrix Potter(19
stories), H. C. Andersen(77 stories), and the Brothers Grimm(80 stories). There are totally 15,302 sentences
annotated with emotional labels in sentence-level. Emotion label set consists of eight labels that are categorized
into emotional and non-emotional. The label distribution is in Table.1. The dataset is imbalanced, NEUTRAL
was most frequent with 66.26%.

Table 1: Label distribution in dataset, where H is HAPPY, Su+ is POSITIVELY SURPRISED, F is FEARFUL,
A is ANGRY, Sa is SAD, D is DISGUSTED, Su- is NEGATIVELY SURPRISED, N is NEUTRAL
                                          Emotional                       Non-Emotional
                   H        Su+       F        A      Sa      D      Su-         N
                10.52% 2.15% 4.55% 4.77% 5.43% 3.03% 3.29%                    66.26%


4.2   Experiment Setup
We firstly split the dataset described in Sec.4.1 into training and test sets. Since in our study we take context
into consideration, we select 10% stories from each set of stories as test set. Our test set contains 16 stories,
1,189 sentences in total. The label distribution in training set and test set is in Table.2.

                             Table 2: Label distribution in training set and test set.
                 label         H      Su+        F        A         Sa       D         Su-      N
             training set   10.58% 2.12% 4.44% 4.32% 5.31% 3.08% 3.19%                       66.96%
               test set     9.84% 2.52% 5.88% 10.17% 6.90% 2.35% 4.46%                       57.86%


                                                      42
   The pre-trained model we use is BERT-Base-Cased3 . The four baselines we select are naı̈ve majority classifier,
always predicting the most likely label NEUTRAL, and Support Vector Machines (SVM)[CV95] implemented
by scikit-learn4 [PVG+ 11], we use TF-IDF and word2vec embeddings as input to SVM, for word2vec we sum
all word embeddings in sentences and then fed it into model. The last baseline is Naı̈ve Bayes. We fine-tune
pre-trained model on our training set for 2 epochs, and learning rate of 3e-6, which are recommended settings
in original BERT paper[DCLT19].

4.3   Classification Results
Table 3 shows our first set of experimental results for classifying sentences to either emotional or non-emotional
(i.e., for detecting emotional contents in narratives). In evaluation results, pre-training language model with
context feature outperforms all the other models, which proves the e↵ectiveness of incorporating contextual
feature. BERT(WF) performs even worse than the naı̈ve classifier, that shows the necessity of fine-tuning.
BERT(CF) outperforms vanilla BERT by 2.36% in accuracy and 3.38% in precision, and performs 12.03% better
in accuracy and 13.87% in precision over the strongest baselines model Naı̈ve Bayes. We can see from the Table
3. the improvement of incorporating extra knowledge isn’t obvious, we argue that may caused by the limited
coverage of our knowledge base and the way we incorporate knowledge. Experiments were also run for Bi-LSTMs
and CNNs, however the results are not included. Overall, they performed worse than all models listed.

Table 3: Accuracy (%) of emotional versus non-emotional classification and comparison between di↵erent fea-
tures. Our baseline models are naı̈ve majority classifier(NMC), SVM with TF-IDF and Word Embedding, Naı̈ve
Bayes. The next 5 models are all BERT-based, in which WF means without fine-tuning, CF means Contextual
Feature, EK means Extra Knowledge and CK means Contextual & Knowledge
                           Ave. Accuracy (%) Ave. Precision (%) Ave. Recall (%) Ave. F1 (%)
        NMC                             57.86              28.93              50.00             36.65
        SVM+TF-IDF                      61.81              63.04              63.15             61.80
        SVM+Word2Vec                    62.57              64.65              56.99             53.38
        Naı̈ve Bayes                    62.99              62.30              59.01             57.97
        BERT(WF)                        57.77              53.35              57.78             43.55
        BERT                            72.66              72.79              70.33             70.70
        BERT(CF)                        75.02              76.17              72.37             72.85
        BERT(EK)                        72.83              73.01              71.12             70.97
        BERT(CK)                        75.02              76.17              72.37             72.85


Table 4: Evaluation results on Multi-Class classification for emotion labels, metrics are averaged over all classes
                            Ave. Accuracy (%) Ave. Precision (%) Ave. Recall (%) Ave. F1 (%)
        Naı̈ve Bayes                    57.27              36.76              57.27             43.19
        SVM+TF-IDF                      59.20              51.08              59.21             51.63
        SVM+Word2Vec                    59.29              57.71              59.29             58.48
        BERT                            63.07              60.42              63.07             61.71
        BERT(CF)                        66.28              62.23              66.27             64.19

   Furthermore, we conduct experiments on multi-class classification of all labels including HAPPY, POSI-
TIVELY SURPRISED, FEARFUL, ANGRY, SAD, DISGUSTED, NEGATIVELY SURPRISED, NEUTRAL.
Results are shown in Table 4. In the second experiment, we unfortunately did not employ knowledge features
because our knowledge base is incompatible with label set. Results in terms of accuracy, precision, recall and F1-
score are included. From Table 4, we can see that BERT with contextual feature outperforms the other models in
four metrics. The averaged accuracy of BERT(CF) improved by approximately 6.99% over SVM+Word2Vec, by
7.08% over SVM+TF-IDF and by 3.21% over vanilla BERT. Since our dataset is imbalanced, if more annotated
data becomes available the performance of our model seems likely to improve.
  3 https://huggingface.co/transformers/
  4 https://scikit-learn.org/stable/index.html


                                                      43
5   Conclusion and Future Work
In this paper, we apply pre-trained language models to sentence-level sentiment analysis of narrative texts
and introduce two new features into model: context feature and extra knowledge feature. We evaluate the
performance of a pre-trained model for sentiment analysis of fairy tales with a range of features. Experiments
demonstrate that our method improves classification accuracy, showing the importance of incorporating context
for sentiment analysis of narrative texts. In this work, we propose a sentence-level model to classify sentiment of
narrative texts however how to build a document-level model based on that remains to be further investigated.
Our extra knowledge dataset only covers a small portion of text due to limitations with respect to acquisition of
the knowledge set with sentiment annotation, hence our incorporation of knowledge did not make a significant
improvement. In future work, we aim to build a larger knowledge dataset to improve coverage for narrative
text. In addition, we would like to further investigate the alternative ways to inject knowledge into sentiment
prediction for narrative text.

Acknowledgements
This work was supported by Science Foundation Ireland. The authors would like to thank Jennifer Foster and
three anonymous reviewers for their helpful comments.

References
[Alm08]     Cecilia Alm. A↵ect in Text and Speech. PhD thesis, University of Illinois at Urbana-Champaign,
            2008.
[CKLM19] Kevin Clark, Urvashi Khandelwal, Omer Levy, and Christopher D. Manning. What does BERT
         look at? an analysis of bert’s attention. CoRR, abs/1906.04341, 2019.
[CV95]      Corinna Cortes and Vladimir Vapnik. Support-vector networks. Machine Learning, 20(3):273–297,
            Sep 1995.
[DC07]      Sanjiv R. Das and Mike Y. Chen. Yahoo! for amazon: Sentiment extraction from small talk on the
            web. Management Science, 53(9):1375–1388, 2007.
[DCLT19]    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep
            bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of
            the North American Chapter of the Association for Computational Linguistics: Human Language
            Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota, June
            2019. Association for Computational Linguistics.
[dSG14]     Cı́cero dos Santos and Maı́ra Gatti. Deep convolutional neural networks for sentiment analysis of
            short texts. In Proceedings of COLING 2014, the 25th International Conference on Computational
            Linguistics: Technical Papers, pages 69–78, Dublin, Ireland, August 2014. Dublin City University
            and Association for Computational Linguistics.
[HS00]      Alison Huettner and Pero Subasic. Fuzzy typing for document management. ACL 2000 Companion
            Volume: Tutorial Abstracts and Demonstration Notes, pages 26–27, 2000.
[JBBHS18] Charles Jochim, Francesca Bonin, Roy Bar-Haim, and Noam Slonim. SLIDE - a sentiment lexicon of
          common idioms. In Proceedings of the Eleventh International Conference on Language Resources and
          Evaluation (LREC 2018), Miyazaki, Japan, May 2018. European Language Resources Association
          (ELRA).
[JSS19]     Ganesh Jawahar, Benoı̂t Sagot, and Djamé Seddah. What does BERT learn about the structure
            of language? In Proceedings of the 57th Annual Meeting of the Association for Computational
            Linguistics, pages 3651–3657, Florence, Italy, July 2019. Association for Computational Linguistics.
[KF16]      Rasoul Kaljahi and Jennifer Foster. Detecting opinion polarities using kernel methods. In Proceed-
            ings of the Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions
            in Social Media (PEOPLES), pages 60–69, Osaka, Japan, December 2016. The COLING 2016 Or-
            ganizing Committee.


                                                      44
[Moh13]     Saif Mohammad. From once upon a time to happily ever after: Tracking emotions in novels and
            fairy tales. CoRR, abs/1309.5909, 2013.
[NIK10]     Tetsuji Nakagawa, Kentaro Inui, and Sadao Kurohashi. Dependency tree-based sentiment clas-
            sification using crfs with hidden variables. In Human Language Technologies: The 2010 Annual
            Conference of the North American Chapter of the Association for Computational Linguistics, pages
            786–794. Association for Computational Linguistics, 2010.
[PLV02]     Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up?: Sentiment classification using
            machine learning techniques. In Proceedings of the ACL-02 Conference on Empirical Methods in
            Natural Language Processing - Volume 10, EMNLP ’02, pages 79–86, Stroudsburg, PA, USA, 2002.
            Association for Computational Linguistics.
[PNI+ 18]   Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee,
            and Luke Zettlemoyer. Deep contextualized word representations. In Proceedings of NAACL-HLT,
            pages 2227–2237, 2018.
[PVG+ 11]   F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Pretten-
            hofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and
            E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research,
            12:2825–2830, 2011.
[RNSS]      Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving language under-
            standing by generative pre-training.
[SPW+ 13]   Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and
            Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank.
            In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages
            1631–1642, Seattle, Washington, USA, October 2013. Association for Computational Linguistics.
[Too12]     Michael Toolan. Narrative: A critical linguistic introduction. Routledge, 2012.
[VC12]      G Vinodhini and RM Chandrasekaran. Sentiment analysis and opinion mining: a survey. Interna-
            tional Journal, 2(6):282–292, 2012.
[VCB14]     Andrea Vanzo, Danilo Croce, and Roberto Basili. A context-based model for sentiment analysis in
            twitter. In Proceedings of coling 2014, the 25th international conference on computational linguistics:
            Technical papers, pages 2345–2354, 2014.
[VSP+ 17]   Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz
            Kaiser, and Illia Polosukhin. Attention is all you need. CoRR, abs/1706.03762, 2017.
[WDS+ 19] Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi,
          Pierric Cistac, Tim Rault, R’emi Louf, Morgan Funtowicz, and Jamie Brew. Huggingface’s trans-
          formers: State-of-the-art natural language processing. ArXiv, abs/1910.03771, 2019.
[WHZ+ 16] Yequan Wang, Minlie Huang, Li Zhao, et al. Attention-based lstm for aspect-level sentiment classi-
          fication. In Proceedings of the 2016 conference on empirical methods in natural language processing,
          pages 606–615, 2016.
[WWB+ 04] Janyce Wiebe, Theresa Wilson, Rebecca Bruce, Matthew Bell, and Melanie Martin. Learning
          subjective language. Computational linguistics, 30(3):277–308, 2004.
[ZWX16]     Xinjie Zhou, Xiaojun Wan, and Jianguo Xiao. Attention-based lstm network for cross-lingual senti-
            ment classification. In Proceedings of the 2016 conference on empirical methods in natural language
            processing, pages 247–256, 2016.
[ZZL+ 09]   Changli Zhang, Daniel Zeng, Jiexun Li, Fei-Yue Wang, and Wanli Zuo. Sentiment analysis of chinese
            documents: From sentence to document level. J. Am. Soc. Inf. Sci. Technol., 60(12):2474–2487,
            December 2009.


                                                     45