Deep Learning Enhanced with Graph Knowledge for Sentiment Analysis Fernando Loveraa , Yudith Cardinalea , Davide Buscaldib , Thierry Charnoisb and Masun Nabhan Homsia,c a Universidad Simón Bolívar, Caracas, Venezuela b Institut Galilée, Université Paris 13, France c Helmholtz Centre for Environmental Research, Germany Abstract The traditional way to address the problem of sentiment classification is based on Machine Learning techniques; however, these models are not able to grasp all the richness of the text that comes from different social media, personal web pages, blogs, etc., ignoring the semantic of the text. Knowledge Graphs give a way to extract structured knowledge from images and texts, in order to facilitate their semantic analysis. In this work, we propose a new hybrid approach for Sentiment Analysis based on Knowledge Graphs and Deep Learning techniques, to identify the sentiment polarity (positive or neg- ative) in short documents, particularly in 3 tweets. We represent the tweets using graphs, then graph similarity metrics and a Deep Learning classification algorithm are applied to produce sentiment pre- dictions. This approach facilitates the traceability and explainability of the classification results, since it is possible to visually inspect the graphs. We compare our proposal with character n-gram embeddings based Deep Learning models to perform Sentiment Analysis. Results show that our proposal is able to outperforms classical n-gram models, with a recall up to 89% and F1-score of 88%. Keywords Sentiment Analysis, Knowledge Graph, Long-Short Term Memory (LSTM), Graph Similarities 1. Introduction Users of social networks, like Twitter, make use of such platforms to express opinions, as well as emotions on any topic. In this context, intelligent classification models, to perform Sentiment Analysis, have demonstrated efficiency to predict feelings in texts and to determine users’ perception of aspects of everyday life [1]. In general, the idea is to predict the results or trends of a particular topic based on senti- ment [2], for example in contexts of product preferences in the market [3], film preferences [4], or political opinions [5]. However, in Twitter, predicting sentiment is a challenging problem since most of tweets do not have a well-formed grammatical structure. Nowadays, there ex- ists an increasing interest of improving Sentiment Analysis techniques in order to reach more accurate, traceable, and explainable results, as well as better performance in real-time applica- tions [6]. X-SENTIMENT: 6th International Workshop held at ESWC on eXplainable SENTIment Mining and EmotioN deTection, June 07, 2021, Hersonissos, Greece " {flovera,ycardinale}@usb.ve (Y. Cardinale); {davide.buscaldi,thierry.charnois}@lipn.univ-paris13.fr (T. Charnois); masun.homsi@ufz.de (M.N. Homsi)  0000-0002-5966-0113 (Y. Cardinale); 0000-0003-1112-3789 (D. Buscaldi); 0000-0001-7427-6198 (M.N. Homsi) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Several computational techniques can be applied to solve the problem of predicting senti- ment [7]. It varies from linear models to Deep Learning [8], and even some of research use Linked-Data [9]. For example, the use of Machine Learning allows defining a classifier that learns to differentiate positive and negative sentiments and then determines the polarity of new texts [10]. Knowledge Graphs provide a way to extract structured knowledge from images and texts, to facilitate the semantic analysis. This knowledge is applied to determine sentiment polarities based on similarity measurements between the text’s graph and pre-determined po- larities’ graphs. Therefore it has broad application prospects in many areas, including health, computer networks, among others [11]. In this context, we propose a new hybrid approach that considers the words of the text connected to their definition. We address the problem of predicting sentiment in short pieces of texts, in particular in tweets, by treating the words of a tweet as entities that potentially are connected with other entities through the expansion of its Knowledge Graph representation. We represent the tweets us- ing graphs, then graph similarity metrics (tweets’ graphs vs. polarities’ graphs) and a Deep Learning classification algorithm are applied to produce sentiment predictions. We compare our proposal with character n-gram embeddings based Deep Learning models to perform Sen- timent Analysis. Results show that our proposal is able to outperform classical n-gram models, reaching a recall up to 89% and the an F1-score of 88%. These results demonstrate that the use of Knowledge Graphs opens the opportunity of exploring the use of semantics in the task of Sentiment Analysis, as well as facilitating the tracebility and explainability of the classification results, since these graphs can be visually inspected. Moreover, the construction of Knowledge Graphs is not affected by the size of the text or the use of dialects. 2. Related Work Classical approaches for Sentiment Analysis on social media are based on linear techniques. The study described in [12], presents a performance analysis of several tools (AlchemyAPI, Lymbix, ML Analyzer, among others) based on linear techniques for predicting sentiment, in- cluding Support Vector Machines (SVMs), Decision Trees, and Random Forest model. Authors combine different media sources (Twitter, reviews, and news) to build the datasets. In ex- periments, the best tools got an accuracy of 84%, with a Random Forest models. The study concludes that the accuracy tends to decrease when the texts get longer (this might be due that the model is unable to capture long-distance dependencies between words and also un- able to detect other sentiments – i.e., neutral sentiment) and the combination of the tools with a meta-classifier can enhance the accuracy of the predictions. Some other traditional approaches supported on Machine Learning techniques, are based on language features. Pang and Lee [13], investigate the performance of various Machine Learn- ing techniques, such as Naive Bayes, Maximum Entropy, and SVM in the domain of opinions on movies. From their analysis, they obtained 82.9% accuracy in their model, using SVM with uni- grams. Normally, the features used for a sentiment classifier are extracted by Natural Language Processing (NLP). These NLP techniques are mainly based on the use of n-grams, nonetheless it is also popular the use of bag-of-words for this task. Many studies show relevant results using the bag-of-words as a text representation for object categorization [14, 15]. Researchers exploit Table 1 Deep Learning techniques and results Work Model Dataset Result Yanagimoto et al., 2013 [19] DNN T&C New F-score of 90.8% of accuracy Li et al., 2014 [20] RNDM 2270 movie reviews from websites Accuracy of 90.8% Severyn and Moschitti, 2015 [16] CNN Semeval-2015 F-measure score sub-task A: 84.19% and sub-task B: 64.69% Yanmei and Yuda, 2015 [17] CNN 1000 micro-blog Statistical model with comments average of 85.4% of accuracy Silhavy et al., 2016 [21] HBRNN 150,175 labelled reviews HBRNN outperformed the from 1500 hotels rest of the methods. Arras, Leila, Montavon, 2017 [18] RNN 11,855 single sentences Acurracy of 82.9% for binary from movies review classification (positive and negative). the NLP topics and use them to implement Deep Learning models based on neural networks that have more than three layers. In general, in these studies, predicting sentiment is a task that is performed by Deep Learning models. These models include Convolutional Neural Networks (CNN) [16] [17], Recursive Neural Networks (RNN) [18], Deep Neural Networks (DNN) [19], Recursive Neural Deep Model (RNDM) [20], and Deep Believe Networks. Different researches combine models in their study, and then these models are named Hybrid Neural Networks, for example the Hierarchical Bidirectional Recurrent Neural Network (HBRNN) [21] . In Table 1, we compare some relevant and recent works based on Deep Learning techniques, in terms of performance. As shown in Table 1, results in different studies demonstrate how good the Deep Learning techniques perform with different datasets. In particular, models based on RNN have obtained results above 80% of accuracy. An emerging strategy to perform Sentiment Analysis, is based on using graphs as feature representation, which show an enhancement in text-mining applications. Nonetheless, one of the main challenges with the graphs is related to their construction, that is not direct and de- pends on the grammatical structure of the text. Only few studies have approached this strategy. In [22], authors use a co-occurrence graph that represents relationships among terms of a document; they use centrality measures for extraction of sentiment words that can express the sentiment of the document; then, they use these words as features for supervised learning algorithms and obtain the polarity of the new document. In [23], authors use graphs to represent the sequences of words on a document. They ap- ply graph similarity metrics and classification algorithms, such as SVM to predict sentiment. In [24], authors try to leverage a deep graph-based text representation of sentiment polarity by combining graphs and use them in a representation learning approach. Table 2 shows a comparative evaluation of these studies. Even though the approach of graph-based techniques uses a different data structure for features and its learning pattern is different, these techniques reach to results as good as the techniques based on Deep Learning. Our main contribution is related to Knowledge Graphs, whose main advantage is that their construction is not affected by the size of the text or the use of dialects and can be visually in- spected. Results of previous studies suggest that the use of Deep Learning, for the task of sen- timent prediction, produces accurate models. Therefore, combining Knowledge Graphs with Deep Neural Networks allows obtaining a powerful model, able to produce accurate, traceable, and explainable results in predicting sentiment labels. Table 2 Graph-based techniques for sentiment classification and results Work Model Dataset Result E. Castillo, O. Cervantes, 2015 [22] Co-occurrence graphs SemEval 2015 76% for positive and 68.04% for neutral classes J. Violos, 2016 [23] Word-graph model based Twitter dataset 75.07% of accuracy. K. Bijari, H. Zare, 2019 [24] Sentence-level graph-based IMDB dataset 88.31% for negative and text representation IMDB dataset 86.60% for positive classes Figure 1: Extended graph with two entities and its relationship. 3. Knowledge Graphs: Preliminaries Knowledge Graphs (KG) are inspired in interlinking data, modeling unstructured informa- tion in a meaningful way. A KG is also known as a Knowledge Base (technology that stores complex unstructured or structured information/data), and it is defined as a network in which the nodes indicate entities and the edges indicate their relation. DBpedia1 , YAGO2 , SUMO3 are three examples of huge KG that have been released on the past decade, since they have become an outstanding resource for NLP applications, such as Question-Answering (QA) [25]. Formally, the essence of KGs are triples, and its general form is explained in Def. 1. Definition 1. Knowledge Graph (𝐾𝐺). A knowledge graph, denoted as 𝐾𝐺, is a triple 𝐾𝐺 = (𝐸, 𝑅, 𝐹 ), where 𝐸 = {𝑒1 , 𝑒2 , ..., 𝑒𝑛 } is a set of entities, 𝑅 = {𝑟1 , 𝑟2 , ..., 𝑟𝑛 } represents a set of binary relations, and 𝐹 ⊇ 𝐸𝑥𝑅𝑥𝐸 represents the relationships between entities (fact triple set). Since the KG stores real-world information in RDF-style triplets4 , such as (ℎ𝑒𝑎𝑑, 𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛, 𝑡𝑎𝑖𝑙), it can be employed to support the representation of knowledge in different applications. For example, in the medical industry it can provide a clear visual representation of diseases as mentioned in [26]; in e-commerce, it maps clients’ purchase intentions with sets of candidate products as mentioned in [27]. The construction of KG is based on its entities (also known as nodes) and their interrelation with other objects, organized in the form of a graph. Each entity is able to share its knowledge with other entities. Figure 1 shows an example of a KG repre- senting the following triples: (’Da Vinci’, ’is a’, ’Person’), (’Da Vinci’, ’painted’, ’Mona Lisa’), (’Mona Lisa’, ’is located in’, ’Louvre’). 1 http://aksw.org/Projects/DBpedia.html 2 https://yago-knowledge.org/ 3 http://www.adampease.org/OP/index.html 4 http://www.w3.org/TR/rdf11-concepts/ 4. Sentiment Analysis based on KG: Our Proposal The proposed method is inspired by other works that are related to the representation of text as graphs [23]. We use updated techniques, such as Knowledge Graphs and Deep Learning, that are different from the traditional n-gram representation with the aim of better understand- ing of the sentiment, which is related to the representation of text as graphs. KG and vector representation of text lead the Sentiment Analysis task in tweets. Each tweet and the sentiment polarities (positive and negative) are represented as a KG (see Def. 1). They are built from a subset of the training set – i.e., one part of the training set is taken to produce the KG representing sentiment polarities (one for positive, one for negative) and the other produces KG for every tweet. The similarity comparison between the sentiment polarities and the individual KG produces vectors used to train a classifier model, in this case an LSTM network and a Bi-LSTM network. On the test set, each new tweet to be classified is transformed to a KG and then we measure its similarity to the polarity graphs. Deep Learning models combined with the explicit semantic of texts represented by KG and the similarity metrics, facilitate the traceability and explainability of the classification results, since these graphs can be visually inspected, while the accuracy of the results is ensured. 4.1. Dataset The dataset used for the implementation of our approach is Sentiment1405 , which contains 1,600,000 tweets in English, labeled positive, negative, and neutral, as well as meta-data de- scribing each tweet. For the purposes of Sentiment Analysis, only the content of the tweet text along with its tag is required. However, the neutral tag is not considered in this work, because we are only interested in tweets that contain a polarity – i.e., that express a sentiment. The dataset was divided in two pieces, one for training the model and the rest for testing it. The first 80% of the dataset was destined to training and the other 20% for testing. With this last 20%, we repeated the process of transforming the tweets to small KG and then measuring the similarities with the same metrics explained in Section 4.4. Then, we ran the model with these new similarities as input and we obtained a predicted label as an output. With these predicted labels we computed the 𝐹 1 score of the model to measure its efficiency. 4.2. Pre-processing The pre-processing of the raw text involves removal of characters that do not help to detect Sentiment, such as HTML characters, elements that contain the special character "@", URLs and hashtag; transformation to lower case, since the analysis is sensitive to case letters; expansion of English contraction, such as don’t, won’t; elimination of long words, we eliminate tweets that contain words longer than 45 characters; elimination of stop words, such as the, a, this. Nonetheless, it does not include the lemmatization, since we do not require the normal form of the words, but the inflected form of the words. 4.3. Graph construction The automatic construction of KG demands finding a way to recognize the entities and their relationships. The following steps describe the KG construction process: 5 http://cs.stanford.edu/people/alecmgo/trainingandtestdata.zip Figure 2: Tree parser representation of the sentence: "The young programmer recently won the ICPC global tournament". Figure 3: Knowledge Graph representation of the sentence: "The young programmer recently won the ICPC global tourmanet" 1. Sentence segmentation: Split the text (a tweet, in this case) into sentences. Therefore, we achieve to have exactly one object and one subject per sentence. 2. Entities extraction: An entity is a Noun or a Noun Phrase (NP). We can extract a single word entity from a sentence with the help of Parts of Speech (PoS) tags. For example, in the sen- tence "Rafael won the first prize", the PoS tags classify "Rafael" as a Nominal Subject (nsubj), and "prize" as a Direct Object (dobj), both of them are syntactic dependency tags that contain the information needed for the formation of the KG entities. For most of the sentences, the use of PoS tags alone are almost enough. Nonetheless, for some sentences the entities span in multiple words; therefore, the syntactic dependency tags are not sufficient. For example, in the sentence "The 42-year-old won the prize", "old" is classified as the nsubj; nonetheless, we would like to extract "42-year-old" instead. The "42-year" is classified as adjectival modifier (amod) – i.e., it is a modifier of "old". Something similar happens with the dobj. In this case, we do not have modifiers but compound words (collection of words that form a new term with different meaning); for example, instead of having the word "prize" in our phrase, we have "ICP global tournament". The PoS tags only retrieve "tournament" as the dobj; however, we want to extract the compound words. These words are: "ICP" and "global", as well. Hence, we need to extract the subjects and objects along with its punctuation marks, modifiers, and also the compound words. Therefore, we parse the dependency tree of the sentence. For this, we extract the modifier of the subject (amod in the dependency tree). For example, for the sentence "The young programmer recently won the ICPC global tour- nament", its parse tree can give information of the PoS tags, defining the determiners, verbs, modifiers, subjects, and grouping them in verb phrases (VP) and noun phrases (NP) sub-trees, as shown in Figure 2. In this example, the entity "young programmer" is identified as a subject and "ICPC global tournament" as the object of the sentence. Therefore, for these cases, it is needed to include modifiers and adverbs in the entities of the KGs. 3. Relationships extraction: To extract the relation between nodes, we assume that it refers to the main verb of the sentence. Therefore, the main verb represents the relationship between two entities. In the sentence in Figure 2, the predicate is "won", which is also tagged as "ROOT" or main verb. Figure 4: Similarity measurements 4. Building the knowledge graph: In order to build the KG, it is necessary to work with a network in which the nodes are the entities and the edges between the nodes represent the relations between the entities. It needs to be a directed graph, which means that the relation between two nodes is unidirectional. In Figure 3, we show a KG representation of the parser tree presented in Figure 2. The entities of this small KG are given by "young programmer" and "ICPC global tournament", the relation between the entities is the main verb, in this case the verb "won". In this way, we construct the KG given a sentence as an input. The two polarity KG are constructed from positive tagged tweet sets and another negative tagged set, respectively, taken from the training dataset. Then, with the other part of the training dataset a KG is generated for each tweet. 4.4. Graph similarities The similarity between the graph that represents the tweet and the polarity graphs, expresses how the tweet is related to one polarity or to another. It indicates that if the graph of a tweet is more similar to the positive polarity graph, this tweet is considered as expressing a positive sentiment (as shown in Figure 4). Each similarity distance means the percentage of correlation between the new graph and the polarity graphs. If a tweet is more related to a positive polarity, therefore its correlation is higher to this polarity, then a positive label is assigned to this tweet. Otherwise, a negative label is assigned. To calculate the similarity of a tweet KG, in this work, we use three graph similarity metrics: 1. Containment similarity measurement: It expresses the percentage of common edges between the graphs, taking the size of the smaller graph as the factor of this measurement. Given 𝐾𝐺𝑡 as the KG of a tweet and 𝐾𝐺𝑠 the KG of a polarity, Eq. (1) calculates the containment similarity measurement between these two graphs, where refers to the edges of the graph. ∑𝑒∈𝐾𝐺𝑡 𝜇(𝑒, 𝐾𝐺𝑠 ) 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑚𝑒𝑛𝑡(𝐾𝐺𝑡 , 𝐾𝐺𝑠 ) = (1) 𝑚𝑖𝑛(|𝐾𝐺𝑡 |, |𝐾𝐺𝑠 |) 2. Maximum common sub-graph similarity measurement: Given two graphs, 𝐾𝐺1 and 𝐾𝐺2 , the maximum common sub-graph of them, is a sub-graph of both graphs, such that there is no another sub-graph of 𝐾𝐺1 and 𝐾𝐺2 with more nodes [28]. The measurement of maximum common sub-graph is based on the sizes of common sub-graph between the two graphs. De- tecting the maximum common sub-graph between two graphs with labeled nodes is a linear problem. Eq. (2) is used to calculate the maximum common sub-graph between 𝐾𝐺𝑡 and 𝐾𝐺𝑠 , where MCSN is a function that returns the number of nodes that are contained in the maximum common sub-graph of these graphs. 𝑀𝐶𝑆𝑁 (𝐾𝐺𝑡 , 𝐾𝐺𝑠 ) 𝑚𝑎𝑥_𝑠𝑢𝑏𝑔𝑟𝑎𝑝ℎ_𝑛𝑜𝑑𝑒𝑠(𝐾𝐺𝑡 , 𝐾𝐺𝑠 ) = (2) 𝑚𝑖𝑛(|𝐾𝐺𝑡 |, |𝐾𝐺𝑠 |) 3. Maximum common sub-graph number of edges: It takes into account the number of common edges that are contained in the maximum common sub-graph instead of the nodes in common. This is reflected on Eq. (3), where the Maximum Common Subgraph of Edges (MCSE) is the number of edges contained in the maximum common sub-graph. 𝑀𝐶𝑆𝐸(𝐾𝐺𝑡 , 𝐾𝐺𝑠 ) 𝑚𝑎𝑥_𝑠𝑢𝑏𝑔𝑟𝑎𝑝ℎ_𝑒𝑑𝑔𝑒𝑠(𝐾𝐺𝑡 , 𝐾𝐺𝑠 ) = (3) 𝑚𝑖𝑛(|𝐾𝐺𝑡 |, |𝐾𝐺𝑠 |) The 𝑀𝐶𝑆𝐸(𝐾𝐺𝑡 , 𝐾𝐺𝑠 ) is defined as the quantity of edges contained in the maximum common sub-graph and maintain the same direction in both 𝐾𝐺𝑡 and 𝐾𝐺𝑠 . It is also possible to mea- sure the similarity using the edges but not taking into account if the direction is maintained, this is what we measured in the Equation (4), where 𝑀𝐶𝑆𝑈 𝐸(𝐺𝑡 , 𝐺𝑠 ) represents the Maximum Common Subgraph of Undirected Edges (i.e., total number of edges contained in the maximum common sub-graph regardless their direction). Thus, we obtain eight similarity metrics (four for each polarity). These metrics form the vectors used to train a classifier and out predictions. 𝑀𝐶𝑆𝑈 𝐸(𝐾𝐺𝑡 , 𝐾𝐺𝑠 ) 𝑚𝑎𝑥_𝑠𝑢𝑏𝑔𝑟𝑎𝑝ℎ_𝑒𝑑𝑔𝑒𝑠(𝐾𝐺𝑡 , 𝐾𝐺𝑠 ) = (4) 𝑚𝑖𝑛(|𝐾𝐺𝑡 |, |𝐾𝐺𝑠 |) 4.5. Dimensionality reduction Typical Machine Learning methodologies imply feature selection or dimensionality reduction, in order to improve metrics, such as accuracy or the mean squared error [29]. In this work, the mutual information criteria for graphs is used. This criterion is applied after the creation of the KG for the sentiment classes and tweets in the training process. Dimensionality reduction discards edges so the graphs become smaller, therefore computational resources are optimized. We use the edge filter with the containment similarity metric. The mutual information crite- rion between the term (𝑡) (edge) and category (𝑐) (sentiment class) is defined as in Eq. (5) [30], where 𝐴 is the number of times that 𝑡 exists in the graph of sentiment polarities and in the graphs of the tweets of the second part of the training process; 𝐵 is the number of times that 𝑡 does not exist in the graph of sentiment polarities but it exists in the graphs of the tweets of the second part of the training process; 𝐶 is the number of times the second part of the training process expresses the sentiment of 𝑐 and do not contain 𝑡; 𝑁 is the total number of documents in the second part of the training process. 𝐴∗𝑁 𝑃𝐼 (𝑡, 𝑐) ≈ log( ) (5) (𝐴 + 𝐶) ∗ (𝐴 + 𝐵) Finally, we calculate the average factor, 𝐼𝑎𝑣𝑔 (𝑡, 𝑐), to know the contribution of the edge to the sentiment graphs. For every edge, Eq. (6) computes the global mutual information; where 𝑃𝑟 (𝑆 𝑖 ) is the percentage of tweets from the second part of the training process that express the sentiment 𝑆 𝑖 . Naturally, if this contribution for a certain edge is smaller than a certain threshold (chosen accordingly to the problem or criteria of the researcher), then the edge is discard. 𝐼𝑎𝑣𝑔 (𝑡, 𝑐) = ∑ 𝑃𝑟 (𝑆 𝑖 ) ∗ 𝑃𝐼 (𝑡, 𝑐) (6) 𝑖∈(+,−) Figure 5: Transformation stages of a dataset to its vector form in the Knowledge Graph Method. Tweets have a limitation in length, thus they are short pieces of text. Therefore, it is necessary to use a classification model able to deal with this characteristic and also deal with the long dependencies in the words in a tweet (i.e., dependencies of words that are not strictly adjacent words or morphemes). In this sense, our work is based on LSTM and Bi-LSTM networks, that are able to manage these text conditions [31]. In Figure 5, we illustrate the steps for the construction of the training set and the respective KG. At the 1st step, we have the training set (a percentage of the whole dataset). At the 2nd step, we divide it into two parts. The first part is used to build the KG that represents the sentiment classes (positive and the negative polarities); we obtain one positive polarity KG and one negative polarity KG, derived from tweets tagged as positive and negative, respectively (3rd step). With the second part, in the 3rd step, we create a KG for every tweet following the process explained in Section 4.3. Then, we measure the similarity between every KG representing the tweet with the positive and negative polarities, resulting in a vector of eight dimensions (four measurements against the positive KG and four against the negative KG – see Section 4.4) and a label of a sentiment (4rd step). Naturally, we take similar steps as in Figure 5, to create a test set. The classifier’s input vectors contain the eight different similarity measurements. We trained the classifier using the 10-fold cross validation approach. In every fold we use 90% of the tweets for training and 10% of the tweets for testing. No added data was used, also no other statistical measurements were taken. 5. Results In order to evaluate our approach, we implemented an LSTM enhanced with the KG version (LSTM with KG) and a Bi-LSTM enhanced with the KG model (Bi-LSTM with KG). We also im- plemented an LSTM (Character n-gram based LSTM) and a Bi-LSTM (Character n-gram based Table 3 Results for experiments. Model 𝐹 1 − 𝑠𝑐𝑜𝑟𝑒 Precision Recall LSTM with KG 0.884 0.880 0.890 Bi-LSTM with KG 0.757 0.690 0.840 Character n-gram based LSTM 0.849 0.840 0.860 Character n-gram based Bi-LSTM 0.852 0.851 0.856 Bi-LSTM) classical versions as baselines, representing state-of-the-art techniques for learning long distance dependencies. For the traditional character n-gram embeddings versions, the original input is the sentence, and then using max-pooling the more important n-grams are extracted, combining the maximum values of each layer of the network in only one vector to finally use a Sigmoid function to make the prediction. For the implementation of the four versions, we use Python 3 with the Keras library and Ten- sorFlow, using GoogleGPUs6 . GoogleGPUs is a free cloud based version of a Jupyter notebook, that allows optimizations for big arrays in Colab Notebooks. The hardware available in these notebooks is a 12GB NVIDIA Tesla K80 GPU, that can be used up to 12 hours continuously. (i) Batch size is 500, Max length of the input sequence is 280, since 280 is the maximum length of a tweet, the embedding matrix is initialized to 0; (ii) Embedding layer created within the Sequential model given by Keras, (iii) SpatialDropout1D layer to delete 1D feature maps, promoting independence between features; and (iv) LSTM and Bi-LSTM layer given by Keras. In our experiments we observed that using the Dimensionality Reduction helps to improve the precision of the method, since it decreases the amount of edges in the KG. Results are shown in Table 3 in terms of 𝐹 1 score, precision, and recall. We can appreciate that our approach based on KG has a higher score with LSTM (LSTM with KG in Table 3) than with Bi-LSTM (BI-LSTM with KG in Table 3); nonetheless, this is a more complex and fast convergence model. Comparing both versions of our model against the two classical Deep Learning models, we appreciate that the Character n-gram based Bi-LSTM model presents the worst results. We got better results with LSTM, for both KG and Character n-gram based LSTM. For both cases, 𝐹 1 scores are quite similar, being results of LSTM with KG better than results of Character n-gram based LSTM; actually, precision and recall values of LSTM with KG are the best among the four models, meaning that we built a model able to make correct predictions and the ratio of correctly predicted values tends to be high as well. The results show that the combination of KG with a classifier is appropriate for detecting the feelings expressed in short texts. Besides results are in the state-of-the-art for Sentiment Analysis, the advantages of our approach also lie in several aspects: (i) the facility of crating KG that can be visually inspected, conducting to more traceable and explainable classification results; (ii) the explicit semantic of texts represented by KG, which capture grammar, long-distance dependencies, and neologism, and are suitable for detecting sentiment in micro-blogging texts; and (iii) this approach represents a way of doing Sentiment Analysis that can be used in other contexts, for example to recognize topics based on the semantic provided by KG and expanded with the use of Linked-Data, which are indeed complex KG. 6 https://colab.research.google.com/notebooks/intro.ipynb#recent=true 6. Conclusion In this work, we propose an innovative Sentiment Analysis method, combining Knowledge Graphs with Deep Learning techniques. Knowledge Graphs can capture structural information of a tweet and also part of its meaning. Our proposed model is based on the use of graph techniques and similarity metrics between graphs. The result of the comparison of graphs (i.e., graph similarity measurements) is a vector, which is fed later to the neural networks, which recognize the polarity of the sentiment. The study of sentiment through Knowledge Graphs and Deep Learning represents an interesting challenge that does not escape from limitations. It could be scenarios where the data is limited (small dataset) and this can lead to an under-fitting classifier unable to generalize a rule for distinguishing sentiments. The recognition of entities and their connections is crucial to our approach, the use of PoS tagging allows performing this operation. Results are comparative with the ones encountered on literature for the same problem. Our methodology is simple, suitable for detecting sentiment in micro-blogging texts, provides more informed and traceable Deep Learning algorithms that produce more accuracy scores, and with the potential to be expanded with the use of Linked- Data, thanks to the properties of the Knowledge Graphs. The use of Knowledge Graphs has proven to be useful to detect semantic information, there- fore this type of graph can be applied in other areas. Future work will investigate how the Knowledge Graphs interact with other applications such as cross-domain polarity classification [32], since they used semantic networks and these can be enhanced with the use of knowledge graphs. Another future work for the Knowledge Graphs for Sentiment Analysis is in the area of Irony detection [33], where the sentiment makes helps to differentiate between an ironic and non-ironic tweet. References [1] M. M. Mostafa, More than words: Social networks’ text mining for consumer brand sen- timents, Expert Syst. Appl. 40 (2013) 4241–4251. [2] E. Cambria, Affective computing and sentiment analysis, IEEE Intelligent Systems 31 (2016) 102–107. [3] M. A. Mirtalaie, O. K. Hussain, E. Chang, F. K. Hussain, Sentiment analysis of specific product’s features using product tree for application in new product development, in: Internat. Conf. on Intelligent Networking and Collaborative Systems, 2017, pp. 82–95. [4] E. Chu, D. Roy, Audio-visual sentiment analysis for learning emotional arcs in movies, in: IEEE Internat. Conf. on Data Mining, 2017, pp. 829–834. [5] D. J. S. Oliveira, P. H. d. S. Bermejo, P. A. dos Santos, Can social media reveal the prefer- ences of voters? a comparison between sentiment analysis and traditional opinion polls, Journal of Information Technology & Politics 14 (2017) 34–45. [6] S. Rosenthal, N. Farra, P. Nakov, Semeval-2017 task 4: Sentiment analysis in twitter, in: Internat. Workshop on Semantic Evaluation, 2017, pp. 502–518. [7] D. M. E.-D. M. Hussein, A survey on sentiment analysis challenges, Journal of King Saud University-Engineering Sciences 30 (2018) 330–338. [8] Z. Jianqiang, G. Xiaolin, Z. Xuejun, Deep convolution neural networks for twitter senti- ment analysis, IEEE Access 6 (2018) 23253–23260. [9] J. F. Sánchez-Rada, M. Torres, C. A. Iglesias, R. Maestre, E. Peinado, A linked data ap- proach to sentiment and emotion analysis of twitter in the financial domain., in: WaSABi- FEOSW@ ESWC, 2014. [10] M. Taboada, Sentiment analysis: An overview from linguistics, Annual Review of Lin- guistics 2 (2016). [11] X. Wang, X. He, Y. Cao, M. Liu, T.-S. Chua, Kgat: Knowledge graph attention network for recommendation, in: Int. Conf. on Knowl. Discover & Data Mining, 2019, pp. 950–958. [12] M. Cieliebak, O. Dürr, F. Uzdilli, Potential and limitations of commercial sentiment detec- tion tools., in: ESSEM@ AI* IA, 2013, pp. 47–58. [13] B. Pang, L. Lee, et al., Opinion mining and sentiment analysis, Foundations and Trends® in Information Retrieval 2 (2008) 1–135. [14] J. Krapac, J. Verbeek, F. Jurie, Modeling spatial layout with fisher vectors for image cate- gorization, in: Internat. Conf. on Computer Vision, IEEE, 2011, pp. 1487–1494. [15] K. Sikka, T. Wu, J. Susskind, M. Bartlett, Exploring bag of words architectures in the facial expression domain, in: European Conference on Computer Vision, 2012, pp. 250–259. [16] A. Severyn, A. Moschitti, Twitter sentiment analysis with deep convolutional neural networks, in: Int. Conf. on Research and Develop. in Infor. Retrieval, 2015, pp. 959–962. [17] L. Yanmei, C. Yuda, Research on chinese micro-blog sentiment analysis based on deep learning, in: Internat. Symp. on Comput. Intelligence and Design, 2015, pp. 358–361. [18] L. Arras, G. Montavon, K.-R. Müller, W. Samek, Explaining recurrent neural network predictions in sentiment analysis, arXiv preprint arXiv:1706.07206 (2017). [19] H. Yanagimoto, M. Shimada, A. Yoshimura, Document similarity estimation for sentiment analysis using neural network, in: Int. Conf. Computer and Inf. Science, 2013, pp. 105–110. [20] C. Li, B. Xu, G. Wu, S. He, G. Tian, H. Hao, Recursive deep learning for sentiment analysis over social data, in: Int. Con. on Web Intell. and Intelligent Agent Tech., 2014, pp. 180–185. [21] R. Silhavy, R. Senkerik, Z. K. Oplatkova, P. Silhavy, Z. Prokopova, Artificial intelligence perspectives in intelligent systems, in: Computer Sc. On-line Conf., 2016, pp. 249–261. [22] E. Castillo, O. Cervantes, D. Vilarino, D. Báez, A. Sánchez, Udlap: sentiment analysis using a graph-based representation, in: Int. Workshop SemEval, 2015, pp. 556–560. [23] J. Violos, K. Tserpes, E. Psomakelis, K. Psychas, T. Varvarigou, Sentiment analysis using word-graphs, in: Int. Conf. on Web Intelligence, Mining and Semantics, 2016, pp. 1–9. [24] K. Bijari, H. Zare, E. Kebriaei, H. Veisi, Leveraging deep graph-based text representation for sentiment polarity applications, arXiv preprint arXiv:1902.10247 (2019). [25] X. Huang, J. Zhang, D. Li, P. Li, Knowledge graph embedding based question answering, in: Internat. Conf. on Web Search and Data Mining, 2019, pp. 105–113. [26] M. Rotmensch, Y. Halpern, A. Tlimat, S. Horng, D. Sontag, Learning a health knowledge graph from electronic medical records, Scientific reports 7 (2017) 1–11. [27] K. K. Teru, W. L. Hamilton, Inductive relation prediction on knowledge graphs, arXiv preprint arXiv:1911.06962 (2019). [28] H. Bunke, On a relation between graph edit distance and maximum common subgraph, Pattern Recognition Letters 18 (1997) 689–694. [29] S. Chagheri, S. Calabretto, C. Roussey, C. Dumoulin, Feature vector construction com- bining structure and content for document classification, in: Internat. Conf. on Sciences of Electronics, Technologies of Information and Telecommunications, 2012, pp. 946–950. [30] Y. Xu, G. J. Jones, J. Li, B. Wang, C. Sun, A study on mutual information-based feature selection for text categorization, J. of Computat. Informat. Systems 3 (2007) 1007–1012. [31] B. Wang, W. Liu, G. Han, S. He, Learning long-term structural dependencies for video salient object detection, IEEE Transactions on Image Processing 29 (2020) 9017–9031. [32] M. Franco-Salvador, F. L. Cruz, J. A. Troyano, P. Rosso, Cross-domain polarity classifi- cation using a knowledge-enhanced meta-classifier, Knowledge-Based Systems 86 (2015) 46–56. [33] D. I. H. Farías, V. Patti, P. Rosso, Irony detection in twitter: The role of affective content, ACM Trans. Internet Technol. 16 (2016). URL: https://doi.org/10.1145/2930663. doi:10. 1145/2930663.