Explaining Sentiment from Lexicon Sergio Consolia , Luca Barbagliaa and Sebastiano Manzana a European Commission, Joint Research Centre (JRC), Via E. Fermi 2749, I-21027 Ispra (VA), Italy Abstract Lexicon-based Sentiment Analysis relies on sentiment dictionaries which are used to assign a sentiment polarity to the words of an input text. The overall sentiment of the text is then computed by means of a combining function, such as the word count, sum or average. In this short contribution we describe a detailed set of linguistic rules that allow to understand the text fragments which are semantically linked to a given concept of interest in a text. These heuristics have been designed in the spirit of the recent Interpretable AI trend, since they allow to understand the origin of sentiment for a specific term, providing more transparency and interpretation of the resulting analysis, and enabling the development of advanced and novel lexicon-based Sentiment Analysis approaches, which is the object of our currently on-going work. Keywords Lexicon-based Sentiment Analysis, Natural Language Processing, Interpretability, Rule-based models 1. Introduction The rapid advances in information and communications technology experienced in the last two decades have produced an explosive growth in the amount of information collected, leading to the new era of Big Data [1]. This has brought to the exponential increase in the informa- tion available in various domains, allowing for Natural Language Processing (NLP) and novel knowledge generation methods to emerge in different sectors. In particular, utilizing the senti- ment extracted from social media has long been the tradition of several studies [2]. As the Web rapidly grows and evolves, people are becoming increasingly enthusiastic about interacting, sharing, and collaborating through social networks, online communities, blogs, and wikis [3]. Therefore, it is critical to correctly interpret sentiments and opinions expressed or reported about social events, political movements, company strategies, marketing campaigns, and any other form or online interaction. Sentiment Analysis (SA) [4, 5], also known as Opinion Mining, is a Semantic Web technology, directly related to Natural Language Processing, that aims at understanding whether a certain textual message conveys a positive or negative sentiment with respect to a certain topic, or the overall contextual polarity or emotional reaction to a document, interaction, or event [4, 5]. Its outcome might be a quantitative/qualitative polarity (e.g., [−1 ∶ 1], extr neg, neg, neut, pos, extr pos, etc.) or an emotional state (e.g., joy, anger, etc.). X-SENTIMENT: 6th International Workshop on eXplainable SENTIment Mining and EmotioN deTection, co-located at ESWC, June 07, 2021, Hersonissos, Greece " sergio.consoli@ec.europa.eu (S. Consoli)  0000-0001-7357-5858 (S. Consoli); 0000-0001-5930-5392 (L. Barbaglia) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) SA can be performed using both machine learning methods (see e.g., [6, 7, 8]) and lexicon- based methods (see e.g., [9, 10, 11, 12, 13]). Models driven by machine learning algorithms and vector representations have achieved top performance for various SA tasks. Although these models may get very accurate results, they provide however a limited understanding of patterns and features used to correctly classify the input text into sentiment categories. Therefore, these models lack transparency, traceability, and explainability on how the decisions are taken. In addition, another main disadvantage of machine learning models for sentiment analisis consists in their dependence on labelled data used for model training: it is not always easy to ensure that sufficient and correctly labelled data can be obtained for specific domains. Conversely, lexicon-based approaches to SA are completely unsupervised and do not require any a-priori training corpus. They rely instead on dictionaries of words with assigned positive or negative sentiment polarity scores, also referred to as sentiment dictionaries or lexicons, like, for instance, SentiWordNet [14]1 , SenticNet [15, 16]2 , and Harvard IV-43 , just to name few popular ones. Most of these sentiment dictionaries are freely available online and have been often reused by the interested scientific and professional communities in several applications. Given a sentence, a lexicon-based SA approach works by assigning positive and negative sen- timent polarity values from the dictionary to all the words in the sentence, and a combining function, such as the word count, sum or average [4], is used to aggregate the scores into the overall sentiment of the text. In this way, developers are relieved from collecting and labeling a large, relevant training corpus, at the minor cost of re-using an already existing sentiment dictionary, or constructing, if needed, a customized sentiment lexicon for a specific application. In addition, a lexicon-based SA method can be more easily understood and eventually modi- fied by a human in comparison to a machine learning approach to SA, providing a significant advantage towards interpretability of the model results. Most lexicon-based SA methods focus on a coarse-grained analysis of the sentiment ex- pressed in the text [4], that is, they assess the entire sentiment of a sentence by considering all expressions of positive and negative sentiment contained in that text. However, coarse-grained methods might not be precise enough in evaluating the sentiment polarity of a specific con- cept of interest contained in a sentence, given that the sentiment of the entire text is often not expressed towards that specific topic [17]. In currently on-going research, we are investigating a fine-grained perspective to lexicon- based SA. In particular, we are interested in understanding the parts of the text which convey a sentiment connotation with respect to a specific concept of interest, and properly propagating these sentiments towards an overall computed sentiment score for the topic. While the entire approach is currently under development, we report here the set of linguistic polarity rules used to identify the text fragments semantically connected to a specific concept of interest within a sentence, possibly expressing a sentiment connotation towards it. We believe that explicit semantics can be leveraged to explain why a resource has been scored in a specific sentiment category, inducing trustworthiness and avoiding biases, and accompanying the current model interpretability trend in AI aiming at opening up the black-box by providing a narrative of the 1 SentiWordNet, version 3.0, available at: https://github.com/aesuli/SentiWordNet. 2 SenticNet, available at: https://sentic.net/. 3 The details of the latest version of the Harvard IV-4 dictionary are available at: http://www.wjh.harvard.edu/ ~inquirer/homecat.htm. Table 1 Used spaCy part-of-speech tags. TAG POS DESCRIPTION CC CONJ conjunction, coordinating IN ADP conjunction, subordinating or preposition JJ ADJ adjective JJR ADJ adjective, comparative JJS ADJ adjective, superlative MD VERB verb, modal auxiliary NN NOUN noun, singular or mass NNP PROPN noun, proper singular NNPS PROPN noun, proper plural NNS NOUN noun, plural RBR ADV adverb, comparative RBS ADV adverb, superlative VB VERB verb Table 2 Used spaCy dependency parsing classes. DEP DESCRIPTION acl clausal modifier of noun (adjectival clause) advcl adverbial clause modifier advmod adverbial modifier amod adjectival modifier attr attribute dobj direct object neg negation modifier oprd object predicate pcomp complement of preposition pobj object of preposition prep prepositional modifier xcomp open clausal complement underlying model [18, 19]. Our methodology under development goes into this direction. 2. Set of linguistic rules for lexicon-based Sentiment Analysis We provide here details on the semantic rules used to detect the text fragments semantically connected to a specific concept of interest in an input text. These linguistic rules have been derived experimentally after in-depth natural language analysis [20], and are based on both syntax and semantics of the text. Each rule can be seen as a single building block, and the concatenation of these rules enable to explain how a particular sentiment polarity score is evaluated by an underlying lexicon-based SA algorithm, providing more transparency and in- Figure 1: An illustration on the focused fragment of text for the sentence: Despite Ronaldo’s age, his physical shape looks perfect so far. terpretation of the resulting analysis. The overall process is based on the linguistic features of the spaCy 4 Python library. Tables 1-2 present the main labels assigned by spaCy with respect to part-of-speech tagging and de- pendency parsing, respectively, that we use in our rule scheme. Based on the POS (i.e., the detected part-of-speech), DEP (i.e., the parsed dependency), and TAG (i.e., the tag of the part- of-speech) labels defined in these tables, the algorithm selects a chunk in a sentence only if it contains a certain concept of interest (Concept) specified as input, and falls into one of the adopted semantic rules, detailed in the following. Concept connected to a verb followed by an adjectival complement. The Concept of interest is associated to a verb (POS = VERB) which is followed by an ad- jectival complement relation (DEP = acomp), which means that it connects the verb to an adjectival term which functions as the complement (like an object of the verb) and offers more information about it. The adjective (POS = ADJ) can be in the form of: a. standard adjective (TAG = JJ); Example (Figure 1): ...despite Ronaldo’s age, his physical shape looks perfect so far... acomp (Concept=shape→VERB=looks−−−−−→ JJ=perfect). Concept connected to a verb associated to a noun. In this case Concept is connected to a verb (POS = VERB) which is linked to a noun (POS = NOUN) by means of one of the following relations: a. direct object (DEP = dobj), i.e. a clause which connects a transitive verb to a nominal representing the recipient of the action of such predicate; Example: ...last year Michael received an award for his work... dobj (Concept=Michael→ VERB=received −−−→NOUN=award). 4 spaCy: Industrial-Strength Natural Language Processing in Python. Available at: https://spacy.io/. b. attribute (DEP = attr), i.e. a clause which connects a copula verb to a noun being the non-verb phrase predicate of such verb; Example: ...in Catalonia taxation has been an heavy deterrent on the development of SMEs... attr (Concept=taxation→ VERB=been−−→NOUN=deterrent). Concept connected to a verb followed by an adverbial modifier. The Concept of interest is associated to a verb (POS = VERB) which is followed by an adverbial modifier relation (DEP = advmod), which means that it connects the verb to a non-clausal adverb or adverbial phrase that serves to modify the predicate. The adverb (POS = ADV) can be in the form of: a. comparative adverb (TAG = RBR); Example: ...his power will decline further... advmod (Concept=power→VERB=decline−−−−−−→ RBR=further). b. superlative adverb (TAG = RBS); Example: ...Tim’s attention is best focused on one thing: football! advmod (Concept=attention→VERB=focused −−−−−−→ RBS=best). Concept connected to a verb followed by an object predicate. The Concept of interest is associated to a verb (POS = VERB) which is followed by an object predicate relation (DEP = oprd), that is a non-verb phrase predicate in a small clause that functions like the predicate of an object. It means, in other words, that the linked verb is connected to an adjective that qualifies, describes, or renames the object that appears before it. The adjective (POS = ADJ) can be in the form of: a. standard adjective (TAG = JJ); Example: ...the law has been declared unconstitutional... oprd (Concept=law→VERB=declared −−−→ JJ=unconstitutional). b. comparative adjective (TAG = JJR); Example: ...the ECB kept the rates lower than expected... oprd (Concept=ECB→VERB=kept −−−→ JJR=lower). c. superlative adjective (TAG = JJS); Example: ...the FED is keeping asset prices the lowest... oprd (Concept=FED→VERB=keeping −−−→ JJS=lowest). Concept connected to a verb followed by a prepositional modifier. The Concept of interest is connected to a verb (POS = VERB) which is followed by a preposi- tional modifier relation (DEP = prep), that is a prepositional phrase that modifies the heading verb. The propositional modifier is linked to an adposition (POS = ADP), which basically es- tablishes a grammatical relationship that links its complement to another word or phrase in the context. An adposition typically establishes a semantic relationship which may be spatial (in, on, under, ...), temporal (after, during, ...), or of some other type (of, for, via, ...). The adposition is then connected to one of the following terms: a. a noun (POS = NOUN), by means of an object of preposition relation (DEP = pobj), i.e. a noun phrase that follows a preposition and completes its meaning; Example: ...currently Ronaldo is in great shape... prep pobj (Concept=Ronaldo→ VERB=is−−−→ADP=in−−−→NOUN=shape). b. a verb (POS = VERB), by means of a complement of preposition relation (DEP = pcomp), i.e. a clause which is not a pobj and directly connects the preposition with any dependent completing its meaning; Example: ...The firm is lowering its profits after paying 1 million euros to the tax office... prep pcomp (Concept=firm→VERB=lowering −−−→ADP=after −−−−−→NOUN=paying). Concept connected to a verb followed by an open clausal complement or an adverbial clause modifier. The Concept of interest is in this case associated to a verb (POS = VERB) which is connected to another term by means of one of the following relations: a. open clausal complement (DEP = xcomp), i.e. a predicative or clausal complement with- out its own subject; Example: ...news reported that the the FTSE index could keep loosing... xcomp (Concept=FTSE index→VERB=keep−−−−−→VERB=loosing). b. adverbial clause modifier (DEP = advcl), i.e. a clause which modifies a verb or another predicate (adjective, etc.) as a modifier, not as a core complement; Example: ...industrial production will reach the bottom as it struggles with the current crisis given by the pandemic... advcl (Concept=industrial production→VERB=reach−−−−→ VERB=struggle). Concept associated to an adjectival clause. The Concept of interest is connected by an adjectival clause (DEP = acl), i.e. a finite or non- finite clause that modifies Concept, to a term being: a. a verb (POS = VERB); Example: ...there exist many tools providing similar benefits... acl (Concept=tools−−→VERB=providing). Concept associated to an adjectival modifier. In this case Concept is connected by an adjectival modifier relation (i.e., DEP = amod), i.e. an adjective phrase that modifies the meaning of the Concept of interest, to a term being: a. an adjective (POS = ADJ), in the form of: i. standard adjective (TAG = JJ); Example: ...his top performance is encouraging the rest of the team... amod (Concept=performance−−−−→JJ=top). ii. comparative adjective (TAG = JJR); Example: ...there is a larger consumption than in the past years... amod (Concept=consumption−−−−→JJR=larger). iii. superlative adjective (TAG = JJS); Example: ...the manufacturing sector is experiencing the worst decline since World War II... amod (Concept=decline−−−−→JJS=worst). b. a verb (POS = VERB); Example: ...overall it appears to be an encouraging agreement... amod (Concept=agreement −−−−→VERB=encouraging). 3. Conclusion and Outlook This is on-going research. We are aware that the presented linguistic rules are heuristics de- rived experimentally after in-depth natural language analysis. They further require a rigorous testing, which we aim in our currently on-going work. In the near future we are planning to implement an advanced lexicon-based SA method leveraging on the described linguistic rules, aiming also at an in-depth performance compari- son against other popular SA approaches. This strategy under current development will focus in particular on the economic and financial domains [21], with the goal of providing useful signals to improve forecasting and nowcasting of economic and financial indicators. 4. Acknowledgments The authors would like to thank the colleagues of the Centre for Advanced Studies at the Joint Research Centre of the European Commission for helpful guidance and support during the development of this research work. The views expressed are purely those of the authors and may not in any circumstance be regarded as stating an official position of the European Commission. The author Dr. Sergio Consoli is also particularly grateful to the Wagner team in Catania, Italy, for inspiring discussions, precious advices, and support. References [1] V. Marx, The big challenges of big data, Nature 498 (2013) 255–260. [2] K. Ravi, V. Ravi, A survey on opinion mining and sentiment analysis: Tasks, approaches and applications, Knowledge-Based Systems 89 (2015) 14–46. [3] F. Neri, C. Aliprandi, F. Capeci, M. Cuadros, T. By, Sentiment analysis on social media, in: Proceedings of the 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2012, 2012, pp. 919–926. [4] B. Liu, Sentiment analysis: Mining opinions, sentiments, and emotions, Cambridge Uni- versity Press, 2015. [5] E. Cambria, S. Poria, A. Gelbukh, M. Thelwall, Sentiment analysis is a big suitcase, IEEE Intelligent Systems 32 (2017) 74–80. [6] M. Neethu, R. Rajasree, Sentiment analysis in twitter using machine learning techniques, in: 2013 4th International Conference on Computing, Communications and Networking Technologies, ICCCNT 2013, 2013, pp. 1–5. [7] L. Zhang, S. Wang, B. Liu, Deep learning for sentiment analysis: A survey, Wiley Inter- disciplinary Reviews: Data Mining and Knowledge Discovery 8 (2018). [8] A. Tripathy, A. Agrawal, S. Rath, Classification of sentiment reviews using n-gram ma- chine learning approach, Expert Systems with Applications 57 (2016) 117–126. [9] C. Khoo, S. Johnkhan, Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons, Journal of Information Science 44 (2018) 491–511. [10] M. Mostafa, More than words: Social networks’ text mining for consumer brand senti- ments, Expert Systems with Applications 40 (2013) 4241–4251. [11] D. Reforgiato Recupero, V. Presutti, S. Consoli, A. Gangemi, A. G. Nuzzolese, Sentilo: Frame-based sentiment analysis, Cognitive Computation 7 (2015) 211–225. [12] M. Taboada, J. Brooke, M. Tofiloski, K. Voll, M. Stede, Lexicon-based methods for senti- ment analysis, Computational Linguistics 37 (2011) 267–307. [13] D. Reforgiato Recupero, S. Consoli, A. Gangemi, A. Nuzzolese, D. Spampinato, A semantic web based core engine to efficiently perform sentiment analysis, in: Lecture Notes in Computer Science, volume 8798, 2014, pp. 245–248. [14] S. Baccianella, A. Esuli, F. Sebastiani, SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining, in: LREC, volume 10, 2010, pp. 2200–2204. [15] E. Cambria, R. Speer, C. Havasi, A. Hussain, Senticnet: A publicly available semantic resource for opinion mining, in: 2010 AAAI Fall Symposium Series, volume FS-10-02, 2010, pp. 1–5. [16] E. Cambria, Y. Li, F. Z. Xing, S. Poria, K. Kwok, Ensemble application of symbolic and subsymbolic AI for sentiment analysis, in: Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM ’20), 2020, pp. 105–114. [17] M. Van De Kauter, D. Breesch, V. Hoste, Fine-grained analysis of explicit and implicit sentiment in financial news articles, Expert Systems with Applications 42 (2015) 4999– 5010. [18] T. Kim, B. Routledge, Informational Privacy, A Right to Explanation, and Interpretable AI, in: Proceedings - 2018 2nd IEEE Symposium on Privacy-Aware Computing, PAC 2018, 2018, pp. 64–74. [19] L. Gilpin, D. Bau, B. Yuan, A. Bajwa, M. Specter, L. Kagal, Explaining explanations: An overview of interpretability of machine learning, in: Proceedings - 2018 IEEE 5th In- ternational Conference on Data Science and Advanced Analytics, DSAA 2018, 2019, pp. 80–89. [20] D. Reforgiato Recupero, A. Nuzzolese, S. Consoli, V. Presutti, S. Peroni, M. Mongiovì, Ex- tracting knowledge from text using sheldon, a semantic holistic framework for linked ontology data, in: WWW 2015 Companion - Proceedings of the 24th International Con- ference on World Wide Web, 2015, pp. 235–238. [21] L. Barbaglia, S. Consoli, S. Manzan, Monitoring the business cycle with fine-grained, aspect-based sentiment extraction from news, Lecture Notes in Computer Science (in- cluding subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinfor- matics) 11985 LNAI (2020) 101–106.