Incorporating Context and Knowledge for Better Sentiment Analysis of Narrative Text

Incorporating Context and Knowledge for Better Sentiment Analysis of Narrative Text ChenyangLyu chenyang.lyu2@mail.dcu.ie School of Computing Dublin City University Dublin

Ireland

TianboJi tianbo.ji2@mail.dcu.ie ADAPT Centre Dublin City University

Dublin Ireland

YvetteGraham yvette.graham@dcu.ie ADAPT Centre Dublin City University

Dublin Ireland

Incorporating Context and Knowledge for Better Sentiment Analysis of Narrative Text E63C527CC1A28E43531D7756A15D3351 GROBID - A machine learning software for extracting information from scholarly documents

Recent years have witnessed a significant increase in the availability of online narrative texts, such as news articles and online stories. Automatic sentiment classification of such narrative texts will enable better organisation, search and retrieval. Little research has been carried out to date that to automatically predict sentiment of narrative text however despite much work being carried out to date on sentiment analysis of social media and user feedback. In this paper, we present an approach to sentiment analysis of narrative text that employs pre-trained language model, an approach already proven e↵ective for a range of other NLP tasks. However, for the purpose of sentiment analysis of narrative text in particular, we introduce two new features: a contextual feature and extra knowledge feature that prove to aid text understanding for prediction of sentiment. We conduct preliminary experiments on a publicly available dataset of fairy tale texts. Results show an increase of accuracy over vanilla pre-trained language model and baseline models for classification of sentiment for narrative texts.

Introduction

Sentiment analysis refers to the analysis and prediction of the sentiment and emotion in language. Most research carried out to date in this domain has been based on user-generated content such as social media and user feedback [VC12]. Recently, with the development of online narrative resources such as web-fiction and news, the interest in sentiment analysis of narrative text is slowly expanding [Moh13] in order to make better organisation and retrieval. The sentiment and emotion of narrative text can characterised defined as a general feeling reflected in the text and in this current work we use sentiment to stand for both sentiment and emotion.

Contrary to other types of text, narrative is a perceived sequence of non-randomly connected events [Too12]. For example, A countryman's son stepped on a snake's tail accidentally. The tail suddenly turned and hit him so that he died. The father was very angry so that he cut o↵ part of the snake's tail. is a piece of narrative text 1 composed of connected events. Due to its complexity, analysing the sentiment of narrative text requires a deep understanding of semantic structure and opinions. Since the events that impact the plot of narrative texts are commonly interrelated, this increases the importance of su ciently taking context into consideration. Furthermore, there are several linguistic phenomena, such as metaphor in sentence The lack of having proper standing army is Achilles heel for this small country, and terminology requiring additional knowledge in order to be correctly interpreted. Incorporating such additional knowledge has the potential to help predict the sentiment of text.

Nowadays, pre-trained language models, such as those in [DCLT19], [PNI + 18] and [RNSS], have proven e↵ective for a range of tasks including syntactic and semantic tasks, like textual similarity, sentiment classification, textual entailment, etc. These models are usually trained on large scale corpora and have the ability to capture and model semantic dependencies over long text. Such pre-trained models have the potential to detect and classify the sentiment of narrative text.

In this paper, we propose a model that employs a state-of-the-art pre-trained language model to classify the sentiment class of narrative text. This model takes context into consideration as well as additional knowledge as features to predict the sentiment class. Firstly, what we call context features are used to generate a contextaware representation of a given sentence, before it is used to determine its sentiment class. We provide a detailed description of the methodology and experiments investigating the accuracy of our approach in Sections 3 and 4, respectively.

Related Work

Sentiment Analysis is regarded as a text classification or text mining problem, and there are two main approaches to this problem: a rule-based approach and a learning-based approach. At first in sentiment analysis, a rulebased approach was attempted involving pre-defined lexicon with corresponding sentiment polarity to predict the polarity of a given text by capturing word occurrences [HS00, DC07, WWB + 04]. Besides lexicon methods, Zhang et al [ZZL + 09] also proposed a document-level sentiment analysis framework, which determines sentiment polarity of sentences based on word dependency before aggregating sentences for document-level sentiment prediction, in which single sentence polarity is computed independent of context.

However, sentiment of text can be expressed in a more subtle manner than is possible to detect with such approaches, since they are di cult classify only based on indicative lexical features. Machine learning algorithms have the potential to address the short-comings of rule-based approaches. However, learning-based approaches require a labeled training set to be trained on and then predict new samples. Pang et al [PLV02] proposed three machine learning models, Naïve Bayes, Support Vector Machine and Maximum Entropy to predict sentiment based on bag-of-word feature. [NIK10] proposed a dependency tree-based model to capture interactions between words, while [VCB14] and [KF16] apply a range of kernel methods.

Sentiment analysis with neural networks has a long history. For example, Socher et al [SPW + 13] proposed Recursive Neural Tensor Networks(RNTNs) to model sentence-level sentiment recursively from its syntactic tree.

[dSG14] on the other hand adopted Convolutional Neural Networks (CNNs) to classify sentiment for short text, their model outperforming traditional machine learning models, such as Naïve Bayes and SVM. Some alternate approaches using Recurrent Neural Networks have also been proposed, such as [WHZ + 16] and [ZWX16], the latter employing attention-based LSTM models.

Recently, large-scale pre-trained models have emerged and have been shown very e↵ective on various tasks [PNI + 18, RNSS, DCLT19], including reading comprehension and sentiment classification, etc. Pre-trained models are trained on large unlabeled corpora in unsupervised setting, after pre-training they are able to capture rich semantic patterns and syntactic information from text [JSS19]. For instance, the noun modifier(The former ) in sentence The 45-year-old former General Electric CO. executive figures it will be easier this time. can be clearly attended to its noun (executive) by self-attention mechanism in BERT [CKLM19]. Pre-trained model is quite suitable for us to model sentiment of text, we therefore employ BERT(Bidirectional Encoder Representation from Transformer) on our task because of its capability to model bidirectional dependencies among text.

Methodology

Most current state-of-the-art pre-trained language models are based on a neural architecture known as the transformer[VSP + 17], originally proposed by Vaswani et al.In this study, we use pre-trained model BERT [WDS + 19]. The BERT [DCLT19] we use has been pre-trained on a large unlabeled corpus containing over 3 billion words. This large volume of data enables it to model complex language dependencies in text and produce very rich representations of text. The input is represented as document = {s 1 , ......, s n }, s i = {w i,1 , ......, w i,m }, where s i is the i-th sentence in document.

BERT-Bidirectional Encoder Representation from Transformer

We use BERT as encoder to generate sentiment label for a single sentence, which is based on a transformer block containing one self-attention layer, two layer norms and one feed-forward layer. The transformer we employed consists of 12 stacked transformer blocks, and the output of its last layer is used to get the sentiment label. The overall process of computing sentiment label for a single sentence can be formulated as following:

h l = bert encoder(w 1 , w 2 , ......, w k )(1)

y = sof tmax(linear layer(pooling layer(h l ), context))

where h l is hidden state vectors from the last layer and it will be fed into pooling layer2 and linear layer, then passed into softmax layer with context vector to generate the sentiment label y, which will be introduced in section 3.2. The overall architecture of our model is shown in Figure 1.

Contextual Feature

Usually, the sentiment label of a single sentence in a document is only conditioned on itself:

sentiment label = argmax y p(y|s j )(3)

which does not take advantage of the context. However, we argue that in narrative, the context of a given sentence plays a crucial role in determining its semantic meaning as well as sentiment label.

We propose to incorporate context features when we compute the sentiment label for a sentence by combining the representation vector of context sentences of a certain sentence, in this way we propose to give sentiment label as following:

sentiment label = argmax y p(y|s j , context j ) (4)

context j = (s j t , s j t+1 , ......, s j 1 ) (5

)

where t is the number of sentences we incorporate in context, is the function we use to combine sentence vectors. In our framework, we use the linear combination of context sentences and the target sentence to generate sentiment label. In this work, we use the last sentence of current sentence as it's context, which has the most significant e↵ect on the sentiment of current sentence.

Extra Knowledge

Besides context feature, we propose to use extra knowledge information to improve understanding of sentiment class of text. In this paper, we incorporate entity information and idiom information into our model. We use idioms dataset SLIDE from [JBBHS18], which includes over 5000 idioms with sentiment annotation. In SLIDE dataset, every idiom has a sentiment distribution, which has three entries. Every entry scales from 0 to 1 and the sum of three entries is 1. For our framework, the extra knowledge is represented by a sentiment-term matrix, every entry of a certain term represents their inclination to corresponding sentiment label and all entries sum to one. The sentiment-term matrix is defined as :

M i,j = p(y j |term i ), z X j=1 M i,j = 1(6)

In our model, we incorporate extra knowledge as following:

p(y|s j , context j ) = sof tmax(normalize(h j ) + v X i=1 M i ) (7)

M i is the sentiment distribution vector of the term that is attended in given sentence, v is the number of all attended terms. We first normalize the hidden state h l and then add normalized h l and all term vectors {M i } v i=1 which a↵ect sentiment of text. In our normalize function, the hidden state h l is normalized and scaled by a factor , which makes the ratio of l 1 norms of h l and M i to . In our experiment, we set to 10.

Experiments and Results

Data

For a preliminary validation of our methodology, we evaluate our proposed model with di↵erent features on dataset in [Alm08]. This dataset includes 176 fairy tales from three sets of children's stories: Beatrix Potter(19 stories), H. C. Andersen(77 stories), and the Brothers Grimm(80 stories). There are totally 15,302 sentences annotated with emotional labels in sentence-level. Emotion label set consists of eight labels that are categorized into emotional and non-emotional. The label distribution is in Table .1. The dataset is imbalanced, NEUTRAL was most frequent with 66.26%.

Conclusion and Future Work

In this paper, we apply pre-trained language models to sentence-level sentiment analysis of narrative texts and introduce two new features into model: context feature and extra knowledge feature. We evaluate the performance of a pre-trained model for sentiment analysis of fairy tales with a range of features. Experiments demonstrate that our method improves classification accuracy, showing the importance of incorporating context for sentiment analysis of narrative texts. In this work, we propose a sentence-level model to classify sentiment of narrative texts however how to build a document-level model based on that remains to be further investigated.

Our extra knowledge dataset only covers a small portion of text due to limitations with respect to acquisition of the knowledge set with sentiment annotation, hence our incorporation of knowledge did not make a significant improvement. In future work, we aim to build a larger knowledge dataset to improve coverage for narrative text. In addition, we would like to further investigate the alternative ways to inject knowledge into sentiment prediction for narrative text.

Figure 1 :1Figure 1: Model Architecture for Sentence-Level Sentiment Classification Incorporating Contextual Feature and Knowledge. The blue arrows stand for contextual vectors from previous sentences.

Table 1 :1Label distribution in dataset, where H is HAPPY,Su+ is POSITIVELY SURPRISED, F is FEARFUL, A is ANGRY, Sa is SAD, D is DISGUSTED, Su-is NEGATIVELY SURPRISED, N is NEUTRALWe firstly split the dataset described in Sec.4.1 into training and test sets. Since in our study we take context into consideration, we select 10% stories from each set of stories as test set. Our test set contains 16 stories, 1,189 sentences in total. The label distribution in training set and test set is in Table.2.EmotionalNon-EmotionalHSu+FASaDSu-N10.52% 2.15% 4.55% 4.77% 5.43% 3.03% 3.29%66.26%4.2 Experiment SetupTable 2: Label distribution in training set and test set.labelHSu+FASaDSu-Ntraining set 10.58% 2.12% 4.44% 4.32% 5.31% 3.08% 3.19% 66.96%test set9.84% 2.52% 5.88% 10.17% 6.90% 2.35% 4.46% 57.86%

We use "pooling layer" in accordance with huggingface and this layer simply selects the representation vector of token [CLS], which is a special token added in front of every sentence. https://huggingface.co/transformers/ https://scikit-learn.org/stable/index.html

Acknowledgements

This work was supported by Science Foundation Ireland. The authors would like to thank Jennifer Foster and three anonymous reviewers for their helpful comments.

The pre-trained model we use is BERT-Base-Cased 3 . The four baselines we select are naïve majority classifier, always predicting the most likely label NEUTRAL, and Support Vector Machines (SVM) [CV95] implemented by scikit-learn 4 [PVG + 11], we use TF-IDF and word2vec embeddings as input to SVM, for word2vec we sum all word embeddings in sentences and then fed it into model. The last baseline is Naïve Bayes. We fine-tune pre-trained model on our training set for 2 epochs, and learning rate of 3e-6, which are recommended settings in original BERT paper [DCLT19].

Classification Results

Table 3 shows our first set of experimental results for classifying sentences to either emotional or non-emotional (i.e., for detecting emotional contents in narratives). In evaluation results, pre-training language model with context feature outperforms all the other models, which proves the e↵ectiveness of incorporating contextual feature. BERT(WF) performs even worse than the naïve classifier, that shows the necessity of fine-tuning. BERT(CF) outperforms vanilla BERT by 2.36% in accuracy and 3.38% in precision, and performs 12.03% better in accuracy and 13.87% in precision over the strongest baselines model Naïve Bayes. We can see from the Table 3. the improvement of incorporating extra knowledge isn't obvious, we argue that may caused by the limited coverage of our knowledge base and the way we incorporate knowledge. Experiments were also run for Bi-LSTMs and CNNs, however the results are not included. Overall, they performed worse than all models listed. Furthermore, we conduct experiments on multi-class classification of all labels including HAPPY, POSI-TIVELY SURPRISED, FEARFUL, ANGRY, SAD, DISGUSTED, NEGATIVELY SURPRISED, NEUTRAL. Results are shown in Table 4. In the second experiment, we unfortunately did not employ knowledge features because our knowledge base is incompatible with label set. Results in terms of accuracy, precision, recall and F1score are included. From Table 4, we can see that BERT with contextual feature outperforms the other models in four metrics. The averaged accuracy of BERT(CF) improved by approximately 6.99% over SVM+Word2Vec, by 7.08% over SVM+TF-IDF and by 3.21% over vanilla BERT. Since our dataset is imbalanced, if more annotated data becomes available the performance of our model seems likely to improve.

A↵ect in Text and Speech CeciliaAlm 2008 University of Illinois at Urbana-Champaign PhD thesis What does BERT look at? an analysis of bert's attention KevinClark UrvashiKhandelwal OmerLevy ChristopherDManning CoRR, abs/1906.04341 2019 Support-vector networks CorinnaCortes VladimirVapnik Machine Learning Sep 1995 20 Yahoo! for amazon: Sentiment extraction from small talk on the web RSanjiv MikeYDas Chen Management Science 53 9 2007 BERT: Pre-training of deep bidirectional transformers for language understanding JacobDevlin Ming-WeiChang KentonLee KristinaToutanova Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies Long and Short Papers the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Minneapolis, Minnesota

June 2019 1 Association for Computational Linguistics Deep convolutional neural networks for sentiment analysis of short texts SantosCícero MaíraGatti Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers

Dublin, Ireland; Dublin City

University and Association for Computational Linguistics August 2014 Fuzzy typing for document management AlisonHuettner PeroSubasic ACL 2000 Companion Volume: Tutorial Abstracts and Demonstration Notes 2000 SLIDE -a sentiment lexicon of common idioms CharlesJochim FrancescaBonin RoyBar-Haim NoamSlonim Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Miyazaki, Japan

ELRA May 2018 What does BERT learn about the structure of language? GaneshJawahar BenoîtSagot DjaméSeddah Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics the 57th Annual Meeting of the Association for Computational Linguistics

Florence, Italy

July 2019 Association for Computational Linguistics Detecting opinion polarities using kernel methods RasoulKaljahi JenniferFoster Proceedings of the Workshop on Computational Modeling of People's Opinions, Personality, and Emotions in Social Media (PEOPLES) the Workshop on Computational Modeling of People's Opinions, Personality, and Emotions in Social Media (PEOPLES)

Osaka, Japan

December 2016 The COLING 2016. Organizing Committee From once upon a time to happily ever after: Tracking emotions in novels and fairy tales SaifMohammad CoRR, abs/1309.5909 2013 Dependency tree-based sentiment classification using crfs with hidden variables TetsujiNakagawa KentaroInui SadaoKurohashi Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics Association for Computational Linguistics 2010 Thumbs up?: Sentiment classification using machine learning techniques BoPang LillianLee ShivakumarVaithyanathan Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing -Volume 10 the ACL-02 Conference on Empirical Methods in Natural Language Processing -Volume 10

Stroudsburg, PA, USA

2002 Association for Computational Linguistics Deep contextualized word representations MarkPni + ; Matthew E Peters MohitNeumann MattIyyer ChristopherGardner KentonClark LukeLee Zettlemoyer Proceedings of NAACL-HLT NAACL-HLT 2018 Scikit-learn: Machine learning in Python ;FPvg GPedregosa AVaroquaux VGramfort BMichel OThirion MGrisel PBlondel RPrettenhofer VWeiss JDubourg AVanderplas DPassos MCournapeau MBrucher EPerrot Duchesnay Journal of Machine Learning Research 12 2011 Improving language understanding by generative pre-training AlecRadford KarthikNarasimhan TimSalimans IlyaSutskever Recursive deep models for semantic compositionality over a sentiment treebank Spw + ; Richard AlexSocher JeanPerelygin JasonWu ChristopherDChuang AndrewManning ChristopherNg Potts Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing the 2013 Conference on Empirical Methods in Natural Language Processing

Seattle, Washington, USA

October 2013 Association for Computational Linguistics Narrative: A critical linguistic introduction MichaelToolan 2012 Routledge Sentiment analysis and opinion mining: a survey GVinodhini Chandrasekaran International Journal 2 6 2012 A context-based model for sentiment analysis in twitter AndreaVanzo DaniloCroce RobertoBasili Proceedings of coling 2014, the 25th international conference on computational linguistics: Technical papers coling 2014, the 25th international conference on computational linguistics: Technical papers 2014 Attention is all you need Vsp + ; Ashish NoamVaswani NikiShazeer JakobParmar LlionUszkoreit AidanNJones LukaszGomez IlliaKaiser Polosukhin CoRR, abs/1706.03762 2017 Huggingface's transformers: State-of-the-art natural language processing Wds + ; Thomas LysandreWolf VictorDebut JulienSanh ClementChaumond AnthonyDelangue PierricMoi TimCistac R'emiRault MorganLouf JamieFuntowicz Brew ArXiv, abs/1910.03771 2019 Attention-based lstm for aspect-level sentiment classification Whz + ; Yequan MinlieWang LiHuang Zhao Proceedings of the 2016 conference on empirical methods in natural language processing the 2016 conference on empirical methods in natural language processing 2016 Learning subjective language JanyceWwb + ; TheresaWiebe RebeccaWilson MatthewBruce MelanieBell Martin Computational linguistics 30 3 2004 Attention-based lstm network for cross-lingual sentiment classification XinjieZhou XiaojunWan JianguoXiao Proceedings of the 2016 conference on empirical methods in natural language processing the 2016 conference on empirical methods in natural language processing 2016 Sentiment analysis of chinese documents: From sentence to document level Zzl + ; Changli DanielZhang JiexunZeng Fei-YueLi WanliWang Zuo J. Am. Soc. Inf. Sci. Technol 60 12 December 2009