Emoji-Aware Attention-based Bi-directional GRU Network Model for Chinese Sentiment Analysis Da Li1 , Rafal Rzepka1,2 , Michal Ptaszynski3 and Kenji Araki1 1 Graduate School of Information Science and Technology, Hokkaido University 2 RIKEN Center for Advanced Intelligence Project (AIP) 3 Department of Computer Science, Kitami Institute of Technology {lida, rzepka, araki}@ist.hokudai.ac.jp, ptaszynski@cs.kitami-it.ac.jp Abstract the year by The Oxford Dictionary [Moschini, 2016]. In our opinion, ignoring pictograms in sentiment research is unjus- Nowadays, social media has become the essential tifiable, because they convey a significant emotional informa- part of our lives. Pictograms (emoticons/emojis) tion and play an important role in expressing moods and opin- have been widely used in social media as a medium ions in social media [Novak et al., 2015; Guibon et al., 2016; for visually expressing emotions. In this paper, we Li et al., 2019]. propose a emoji-aware attention-based GRU net- Furthermore, we also noticed that when people use emojis, work model for sentiment analysis of Weibo which they tend to express a kind of humorous emotion which is dif- is the most popular Chinese social media platform. ficult to be easily classified as positive or negative. It seems Firstly, we analyzed the usage of 67 emojis with that some pictograms are used just for fun, self-mockery or facial expression. By performing a polarity anno- jocosity which expresses an implicit humor which might be tation with a new “humorous type” added, we have characteristic to Chinese culture. Figure 1 shows an example confirmed that 23 emojis can be considered more of a Weibo microblog posted with emojis. In the third line of as humorous than positive or negative. On this ba- the post, (ning meng ren1 ) is a new word that appeared sis, we applied the emojis polarity to a attention- in early 2019 on Chinese social media and means “lemon based GRU network model for sentiment analysis of undersized labelled data. Our experimental re- man”. Accordingly, to address this new popular phrase, sults show that the proposed method can signifi- was added to the pictogram repoitoare by social media com- cantly improve the performance for predicting sen- panies in January 2019. This lemon with a sad face is also timent polarity on social media. called “lemon man” which expresses the same emotion as slang ning meng ren – “sour grapes” or “jealous of someone’s success”. This entry seems to convey a humorous nuance of 1 Introduction a pessimistic attitude. Emojis seem to play an important role in expressing this kind of emotions. There is a high possibil- Today, many people share their lives with their friends by ity that this phenomenon can cause a significant difficulty in posting status updates on Facebook, sharing their holiday sentiment recognition task. photos on Instagram or tweeting their views via Twitter or To address this phenomenon, in this paper we focus on Weibo - the biggest Chinese social media network that was the emojis used on Weibo in order to establish if pictograms launched in 2009. Social media data contains a vast amount improve sentiment analysis by recognizing humorous entries of valuable sentiment information not only for the commer- which are difficult to polarize. Because the emojis probably cial use, but also for psychology, cognitive linguistics or po- play an equal or sometimes even more important role in ex- litical science [Li et al., 2018a]. pressing emotion than textual features, we analyzed the char- Over the past decade, sentiment analysis of microblogs be- acteristics of emojis, and report on their evaluation while di- came an important area of research in the field of Natural viding them into three categories: positive, negative and hu- Language Processing. Study of sentiment in microblogs in morous. We also noticed that among the resources of Chinese English language has undergone major developments in re- social media sentiment analysis, the labelled Weibo data sets cent years [Peng et al., 2017]. Chinese sentiment analysis containing emojis are extremely rare which makes consider- research, on the other hand, is still at early stage [Wang et ing them in machine learning approaches difficult. To resolve al., 2013] especially when it comes to utilizing lexicons and this problem, we propose a novel attention-based GRU net- considering pictograms. work model using emoji polarity to improve sentiment anal- Recently, emojis have emerged as a new and widespread ysis on smaller annotated data sets. Our experimental results aspect of digital communication, spanning diverse social net- show that the proposed method can significantly improve the works and spoken language. For example, “face with tears of joy” (an emoji that means that somebody is in an 1 In this paper we use italic to indicate romanization of Chinese extremely good mood) was regarded as the 2015 word of language (pinyin). Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 11 morous. They have confirmed that 23 emojis can be consid- ered more as humorous than positive or negative. On this ba- sis, they used the emoji polarities (see Table 1) in a long short- term memory recurrent neural network (called EPLSTM) for sentiment analysis also on undersized labelled data. [Chen et al., 2018] proposed a novel scheme for Twitter sentiment analysis with extra attention on emojis. They first learned bi- polarity emoji embeddings under positive and negative sen- timental tweets individually, and trained a sentiment classi- fier by attending on these bi-polarity emoji embeddings with an attention-based long short-term memory network (LSTM). Their experiments shown that the bi-polarity embedding was effective for extracting sentiment-aware embeddings of emo- jis. However, humorous posts of social media were not con- sidered in their paper. An attention-based mechanism usually has been used to improve neural machine translation (NMT) by selectively fo- Figure 1: Example of Weibo post with “lemon man” emojis. cusing on parts of the source sentence during translation. [Lu- ong et al., 2015] examined two simple and effective classes of attentional mechanism: a global approach which always uses performance of sentiment polarity prediction. all source words and a local one that only looks at a subset of source words at a time. Their proposed model using different 2 Related Research attention architectures has established a new state-of-the-art Tan and Zhang conducted an empirical study of sentiment result. categorization on Chinese documents [Tan and Zhang, 2008]. Attention-based neural network has also been applied to They tested four features – mutual information, information classification task. Zhou and the others [Zhou et al., 2016] gain, chi-square, and document frequency; and five learning proposed attention-based bidirectional long short-term mem- algorithms: centroid classifier, k-Nearest Neighbor, Winnow ory networks (AttBLSTM) to capture the most important se- classifier, Naı̈ve Bayes (NB) and Support Vector Machine mantic information from a sentence. The experimental re- (SVM). Their results showed that the information gain and sults on the SemEval-2010 relation classification task have SVM achieved the best results for sentiment classification shown that their method outperforms most of the existing coupled with domain or topic dependent classifiers. There are methods. [Yang et al., 2016] proposed hierarchical attention also researchers who have combined the machine learning ap- networks (HAN) for classifying documents. Their model pro- proach with the lexicon-based approach. [Chen et al., 2015] gressively builds a document vector by aggregating important proposed a novel sentiment classification method which in- words into sentence vectors and then aggregating important corporated existing Chinese sentiment lexicon and convolu- sentences vectors to document vectors. Experimental results tional neural network. The results showed that their approach demonstrate that proposed model performs significantly bet- outperforms the convolutional neural network (CNN) model ter than previous methods. Results illustrate that this model only with word embedding features [Kim, 2014]. However, is effective in choosing out important words also in our study all these approaches did not consider emojis. and we decided to adopt it. In 2017, Felbo and collegues [Felbo et al., 2017] proposed a powerful system utilizing emojis in Twitter sentiment anal- ysis model called DeepMoji. They trained 1,246 million 3 Emoji-Aware Attention-based GRU tweets containing at least one of 64 common emojis with Network Approach Bi-directional Long Short-Term Memory (Bi-LSTM) model Inspired by the above-mentioned works, in this paper, we ap- and applied it to interpret the meaning behind the online mes- plied emoji polarity to an attention-based bi-directional GRU sages. DeepMoji is also one of the most advanced sarcasm- network model (EAGRU, where “E” stands for Emojis) for detecting models and irony reverses the emotion of the literal sentiment classification of Weibo undersized labelled data. text, therefore sarcasm-detecting capability can play a signif- The architecture of the proposed method for sentiment clas- icant role in sentiment analysis, especially in case of social sification is shown in Figure 2. media. Although sarcasm and irony tend to convey negative emotions in general, we found that in Chinese social media 3.1 GRU sequence encoder (Weibo in our example), in addition to the expression of pos- itive and negative emotions, people tend to express a kind of The Gated Recurrent Unit [Bahdanau et al., 2014] is a gat- humorous emotion that escapes the traditional bi-polarity. ing mechanism to track the state of sequences without using In their research, [Li et al., 2018b] analyzed the usage of separate memory cells. There are two types of gates: the re- the emojis with facial expression used on Weibo. They asked set gate rt and the update gate zt . They together control how 12 Chinese native speakers to label these emojis by applying information is updated to the state. At time t, the GRU com- one of three following categories: positive, negative and hu- putes the new state as: 12 Table 1: Examples of emojis conveying humor typical for Chinese culture investigated by [Li et al., 2018b] and used in our work. Emoji Humorous {%} Negative {%} Positive {%} 41.7 25.0 33.3 58.3 0.0 41.7 66.7 33.3 0.0 91.7 8.3 0.0 58.3 0.0 41.7 83.3 0.0 16.7 58.3 25.0 16.7 66.7 8.3 25.0 66.7 8.3 25.0 41.7 33.3 25.0 75.0 25.0 0.0 58.3 41.7 0.0 50.0 50.0 0.0 50.0 33.3 16.7 75.0 8.3 16.7 58.3 33.3 8.3 75.0 0.0 25.0 ht = (1 − zt ) ht−1 + zt ht e (1) This is a linear interpolation between the previous state ht−1 and the current new state e ht computed with new se- quence information. The gate zt decides how much past in- formation is kept and how much new information is added. zt is updated as: zt = σ(Wz xt + Uz ht−1 + bz ) (2) where xt is the sequence vector at time t. The candidate ht is computed in a way similar to a traditional recurrent state e neural network (RNN): ht = tanh(Wh xt + rt e (Uh ht−1 ) + bh ) (3) Here rt is the reset gate which controls how much the past state contributes to the candidate state. If rt is zero, then it forgets the previous state. The reset gate is updated as fol- lows: rt = σ(Wr xt + Ur ht−1 + br ) (4) Figure 2: The architecture of the proposed method. 3.2 Word attention Considering that the entries of Weibo are sentences of less than 140 words, in contrast to related work of [Yang et al., 2016], in our research we focus on sentence-level social me- dia sentiment classification. Assuming that a sentence si con- tains Ti words, wit with t ∈ [1, T ] represents the words in the ith sentence. Our proposed model projects a raw Weibo post into a vector representation, on which we build a classifier to 13 perform sentiment classification. In the below, we introduce 3.3 Emoji polarity how we build the sentence level vector progressively from word vectors by using the attention structure. In order to predict sentiment category of Weibo posts consid- Given a post with words wit , t ∈ [0, T ], we first vectorize ering the influence of emojis for Chinese social media senti- the words through an embedding matrix We , xij = We wij . ment analysis, we assign a hyper-parameter λ1 to the proba- We use a bidirectional GRU [Bahdanau et al., 2014] to ad- bility of the deep learning model’s softmax output S(zi ). At dress word annotations by summarizing information from the same time, we apply the labelled emojis from the work of both directions from words, and therefore incorporate the [Li et al., 2018b] as polarity P e , and assign a hyper-parameter contextual information in the annotation. The bidirectional λ2 . P becomes the final probability output of the classifica- → − tion: GRU contains the forward GRU f which reads the sentence ← − si from wi1 to wiT and a backward GRU f which reads from wiT to wi1 : P = λ1 S(zi ) + λ2 Pe (13) xit = We wit , t ∈ [1, T ] (5) where the summation of λ1 and λ2 is equal to 1. → − −−−→ As a result, we can obtain the sentiment probability of a h it = GRU (xit ), t ∈ [1, T ] (6) Weibo post which considers the effect of emojis. ← − ←−−− h it = GRU (xit ), t ∈ [T, 1] (7) We obtain an annotation for a given word wit by concate- 4 Experiments → − nating the forward hidden state h it and backward hidden ← − → − ← − In order to verify the validity of our proposed method, we state h it , for example, hit = [ h it , h it ], which summarizes performed series of experiments described below. the information of the whole sentence centered around wit . Not all words contribute equally to the representation of the Weibo entry meaning. Hence, we introduce attention mecha- 4.1 Preprocessing nism to extract words which are important to the meaning of Initializing word vectors with those obtained from an unsu- the post and show how we calculate the total of the represen- pervised neural language model is a popular method to im- tation of those informative words to form a sentence vector. prove performance in the absence of a large supervised train- Specifically, ing set. For our experiment we collected a large dataset (7.6 million posts) from Weibo API from May 2015 to July uit = tanh(Ww hit + bw ) (8) 2017 to be used in calculating word embeddings. Firstly, we deleted images and videos, treating them as noise. Secondly, exp(uTit uw ) we used Python Chinese word segmentation module Jieba2 to αit = P T (9) t exp(uit uw ) segment the sentences of the microblogs, and fed the segmen- X tation results into the word2vec model [Mikolov et al., 2013] si = αit hit (10) for training word vectors. The vectors have dimensionality of t 300 and were trained using the continuous skip-gram model. We first feed the word annotation hit through a one-layer When we collected microblog data, we discovered that MLP to get uit as a hidden representation of hit , then we Weibo emojis are converted by API into textual tags, for ex- measure the importance of the word as the similarity of uit with a word level context vector uw and get a normalized ample, will be convert into (“smile”). This pro- importance weight αit through a softmax function. Sec- vided us with the possibility of representing emojis in word ondly, we compute the sentence vector si as a weighted sum embedding. Therefore, we transformed the 109 Weibo emo- of the word annotations based on the weights. The con- jis (see Figure 3) into Chinese characters, and converted them text vector uw can be seen as a high level representation into textual features for word embedding. Several examples of a fixed query the informative word over the words like are shown in Table 2. those used in memory networks [Sukhbaatar et al., 2015]. Next, we collected 4,000 Weibo posts containing ambigu- The word context vector uw is randomly initialized and ous eight emojis ( , , , , , , , ), ensur- jointly learned during the training process. The outputs of ing each entry has only one pictogram of a given type (cases softmax layer S(zi ) are the probabilities of each category. with more emojis of the same type were allowed). To use The softmax function is defined as follows [Bridle, 1990; these posts as our training data, we asked three Chinese na- Merity et al., 2016]: tive speakers to annotate them into three categories: “posi- ezi tive”, “negative”, and “humorous”. After one annotator la- S(zi ) = Pi (11) belled polarities of all posts, two other native speakers con- zj j=1 e firmed correctness of his annotations. Whenever there was where the input of softmax layer zi is defined as: a disagreement, all decided the final polarity through discus- zi = wi x + bi (12) sion. and where w is the weight and b is bias, both of them calcu- 2 lated during the model training process. https://github.com/fxsjy/jieba 14 Figure 3: 109 Weibo emojis which were converted into Chinese characters. 4.2 EAGRU Network Table 2: Examples of Textual Features of Emojis. We trained our EAGRU model with 10 epochs and the per- Emoji Textual Feature Emotion/Implication formance achieved the highest value when the dropout rate was 0.5. The validity of the model was examined by holdout “smile” method (90%/10%, training/validation). In general tanh was “applause” used as the activation function and softmax was the network “face with tears of joy” output activation function. “wink” “greedy” 4.3 Baselines “speechless/awkward” We compare our EAGRU method with several baseline meth- “sweat” ods, including traditional deep learning approaches such as convolutional neural network and long short-term memory re- “nosepick” current neural network. “snort” Convolutional Neural Network “upset/fell wronged” “pathetic” Convolutional neural networks (CNN) utilize layers with con- volving filters that are applied to local features [LeCun et “disappointment” al., 1998]. Originally invented for computer vision, CNN “weep” models have subsequently shown to be effective for NLP and have achieved superior results in semantic parsing [Yih et al., “shy” 2014], search query retrieval [Shen et al., 2014], sentence “filthy” modeling [Kalchbrenner et al., 2014], and other traditional “love face” NLP tasks. We experimented with the CNN architecture proposed in “kissy face” [Kim, 2014] and applied our emoji polarities to this model. “leer” The CNN model considering Emoji Polarities (EPCNN) “lick screen” was trained with 10 epochs and the dropout rate was 0.5 (the “dog leash” same as in the proposed method), the filter size was 32 and number of strides was 2. As the activation functions, we used “smugshrug” RELU in general, and the network output activation function was softmax. 15 Long Short-Term Memory Recurrent Neural Network Long short-term memory recurrent neural network (LSTM) [Hochreiter and Schmidhuber, 1997] is well-suited to classi- fying, processing and making predictions based on time series data, since there can be lags of unknown duration between important events in a time series [Eyben et al., 2010]. We utilized EPLSTM proposed in [Li et al., 2018b] trained with 10 epochs and the dropout rate was 0.5 identical with our proposed method. The validity of the model was exam- ined by holdout method (90%/10%, training/validation). The network output activation function was also softmax. Figure 4: Example of correct classification of humorous post. 4.4 Performance Test Using a trained word2vec model, we passed word vectors of training data into the three deep learning models for training. We collected and annotated 180 Weibo entries with the eight emojis mentioned above as a testing set, deleting images and videos. Then we used the proposed method to calculate prob- abilities of each category and confirmed the precision, recall and F1-score. Because we assumed that in emotion expres- sion emojis might play an equal or greater role than text, in our experiment we set the hyper-parameters λ1 and λ2 to 0.4 Figure 5: Example of wrong classification into “positive” category. and 0.6 respectively. We compared the results of sentiment classification by deep learning approaches with and without considering emoji expression is accompanied by emoji, it improves the per- polarities. Results of deep learning models without emojis are formance of classification and predicts the implicit humorous shown in Table 3. Table 4 introduces results of two traditional meaning. deep learning approaches where emoji polarities were con- Error analysis showed that some posts were wrongly pre- sidered, and the results of our proposed method. Table 5 de- dicted due to ambiguous usage of emojis which brought scribes the comparison of F1-scores of the above-mentioned clearly negative impact on the results. In Figure 5 we show methods. an example of such misclassification into “positive” cate- The results proved that our proposed method is more effec- gory annotated as “humorous” by annotators. was con- tive than traditional neural network-based solutions. Limited sidered as more positive than humorous by our annotators to small annotated data, the precision of the sentiment clas- (67%/0%/33%, positive/negative/humorous). It seems that sification was relatively low, but thanks to considering emoji, this particular user wrote a joke just for fun, however, our the F1-score of each category outperformed previous meth- proposed method was misguided by this “smirking” emoji. ods without considering emojis by 6.93 (humorous), 7.41 Therefore, we plan to increase the number of evaluators for (negative) and 7.19 (positive) percentage points. Our pro- annotating Weibo emojis in fine-grained humorous emotion posed emoji-aware attention-based GRU network approach to enhance the reliability of the polarity of emojis. has improved the performance showing that low-cost, small- scale data labeling is sufficient to outperform widely used state-of-the-art when emoji information is added to the deep 6 Conclusions and Future Work learning process. In this paper, we applied information on sentiment of emo- jis to a attention-based GRU network model for sentiment 5 Discussion analysis of undersized labelled data. Our experimental results show that the proposed method can significantly improve the In our proposed approach, we paid attention to emojis in mi- F1-score for predicting sentiment polarity on Weibo. croblogs and investigated how adding pictogram features to a For improving the performance of our proposed method, attention-based GRU network model for recognizing humor- in the near future we are going to increase the amount of la- ous posts which are problematic in sentiment analysis. Fig- belled data to acquire the hyperparameters automatically by ure 4 presents an example of a microblog which was correctly machine learning approaches. Furthermore, we need to in- classified by our proposed method as “humorous” while the crease the number of evaluators for annotating Weibo emojis baseline recognized it incorrectly as a positive one. and Weibo data for more fine-grained categorization of hu- This and similar entries were usually posted as a comment morous posts to enhance the reliability of our experiments. a GIF or video showing a referee who displays her or his skills We also plan to add image processing for classifying stickers in basketball by performing a slam dunk. This post seems to which also seem to convey rich emotional information. express an implied humorous nuance of exaggerated surprise Our ultimate goal is to investigate how much the newly when the poster saw how good the referee was. Because this introduced features are beneficial for sentiment analysis by 16 Table 3: Comparison results of three deep learning approaches not considering emojis (AttBiGRU stands for attention-based bi-directional GRU). Categories Evaluation LSTM CNN AttBiGRU Precision 63.46% 64.71% 77.78% Humorous Recall 77.65% 77.65% 65.88% F1-score 69.84% 70.59% 71.33% Precision 62.79% 70.45% 70.83% Negative Recall 61.36% 70.45% 77.27% F1-score 62.07% 70.45% 73.91% Precision 87.88% 88.23% 65.00% Positive Recall 56.86% 58.82% 76.47% F1-score 69.05% 70.58% 70.27% Table 4: Comparison results of three deep learning approaches considering emoji polarities. Categories Evaluation EPLSTM EPCNN EAGRU Precision 66.02% 69.52% 82.89% Humorous Recall 80.00% 85.88% 74.12% F1-score 72.34% 76.84% 78.26%* Precision 65.91% 79.48% 78.72% Negative Recall 65.91% 70.45% 84.09% F1-score 65.91% 74.69% 81.32%* Precision 90.91% 88.89% 73.68% Positive Recall 58.82% 62.74% 82.35% F1-score 71.43% 73.56% 77.77%* *p < 0.05 Table 5: F-score comparison for deep learning approaches consid- [Chen et al., 2015] Zhao Chen, Ruifeng Xu, Lin Gui, and ering emoji polarities when compared to the best method not using Qin. Lu. Combining convolution neural network and pictograms (AttBiGRU). word sentiment sequence features for Chinese text senti- ment analysis. Journal of Chinese Information Processing, Humorous Negative Positive 2015. AttBiGRU 71.33% 73.91% 70.27% [Chen et al., 2018] Yuxiao Chen, Jianbo Yuan, Quanzeng EPLSTM 72.34% 65.91% 71.43% You, and Jiebo Luo. Twitter sentiment analysis via bi- EPCNN 76.84% 74.69% 73.56% sense emoji embedding and attention-based LSTM. In EAGRU 78.26% 81.32% 77.77% 2018 ACM Multimedia Conference on Multimedia Con- ference, pages 117–125. ACM, 2018. feeding them to a deep learning model which should allow [Eyben et al., 2010] Florian Eyben, Martin Wöllmer, Alex us to construct a high-quality sentiment recognizer for wider Graves, Björn Schuller, Ellen Douglas-Cowie, and Roddy spectrum of sentiment in Chinese language. Cowie. On-line emotion recognition in a 3-d activation- valence-time continuum using acoustic and linguistic cues. 7 Acknowledgment Journal on Multimodal User Interfaces, 3(1-2):7–19, 2010. This work was supported by JSPS KAKENHI Grant Number 17K00295. [Felbo et al., 2017] Bjarke Felbo, Alan Mislove, Anders Søgaard, Iyad Rahwan, and Sune Lehmann. Using mil- References lions of emoji occurrences to learn any-domain representa- tions for detecting sentiment, emotion and sarcasm. arXiv [Bahdanau et al., 2014] Dzmitry Bahdanau, Kyunghyun preprint arXiv:1708.00524, 2017. Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint [Guibon et al., 2016] Gaël Guibon, Magalie Ochs, and arXiv:1409.0473, 2014. Patrice Bellot. From emojis to sentiment analysis. In WA- [Bridle, 1990] John S Bridle. Probabilistic interpretation of CAI 2016, 2016. feedforward classification network outputs, with relation- [Hochreiter and Schmidhuber, 1997] Sepp Hochreiter and ships to statistical pattern recognition. In Neurocomputing, Jürgen Schmidhuber. Long short-term memory. Neural pages 227–236. Springer, 1990. computation, 9(8):1735–1780, 1997. 17 [Kalchbrenner et al., 2014] Nal Kalchbrenner, Edward [Sukhbaatar et al., 2015] Sainbayar Sukhbaatar, Jason We- Grefenstette, and Phil Blunsom. A convolutional neu- ston, Rob Fergus, et al. End-to-end memory networks. In ral network for modelling sentences. arXiv preprint Advances in neural information processing systems, pages arXiv:1404.2188, 2014. 2440–2448, 2015. [Kim, 2014] Yoon Kim. Convolutional neural networks for [Tan and Zhang, 2008] Songbo Tan and Jin Zhang. An em- sentence classification. arXiv preprint arXiv:1408.5882, pirical study of sentiment analysis for Chinese documents. 2014. Expert Systems with applications, 34(4):2622–2629, 2008. [LeCun et al., 1998] Yann LeCun, Léon Bottou, Yoshua [Wang et al., 2013] Xinyu Wang, Chunhong Zhang, Yang Ji, Bengio, and Patrick Haffner. Gradient-based learning ap- Li Sun, Leijia Wu, and Zhana Bao. A depression detection plied to document recognition. Proceedings of the IEEE, model based on sentiment analysis in micro-blog social 86(11):2278–2324, 1998. network. In Pacific-Asia Conference on Knowledge Dis- covery and Data Mining, pages 201–213. Springer, 2013. [Li et al., 2018a] Da Li, Rafal Rzepka, and Kenji Araki. Pre- liminary analysis of Weibo emojis for sentiment analysis [Yang et al., 2016] Zichao Yang, Diyi Yang, Chris Dyer, Xi- of Chinese social media, proceedings. In The 32th Annual aodong He, Alex Smola, and Eduard Hovy. Hierarchical Conference of the Japanese Society for Artificial Intelli- attention networks for document classification. In Pro- gence, 2018. ceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: [Li et al., 2018b] Da Li, Rafal Rzepka, Michal Ptaszynski, Human Language Technologies, pages 1480–1489, 2016. and Kenji Araki. Emoticon-aware recurrent neural net- work model for Chinese sentiment analysis. In The Ninth [Yih et al., 2014] Wen-tau Yih, Xiaodong He, and Christo- IEEE International Conference on Awareness Science and pher Meek. Semantic parsing for single-relation question Technology (iCAST 2018), 2018. answering. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: [Li et al., 2019] Da Li, Rafal Rzepka, Michal Ptaszynski, Short Papers), volume 2, pages 643–648, 2014. and Kenji Araki. A novel machine learning-based senti- ment analysis method for Chinese social media consider- [Zhou et al., 2016] Peng Zhou, Wei Shi, Jun Tian, Zhenyu ing Chinese slang lexicon and emoticons. In The AAAI- Qi, Bingchen Li, Hongwei Hao, and Bo Xu. Attention- 19 Workshop on Affective Content Analysis, AffCon 2019, based bidirectional long short-term memory networks for 2019. relation classification. In Proceedings of the 54th An- nual Meeting of the Association for Computational Lin- [Luong et al., 2015] Minh-Thang Luong, Hieu Pham, and guistics (Volume 2: Short Papers), volume 2, pages 207– Christopher D Manning. Effective approaches to 212, 2016. attention-based neural machine translation. arXiv preprint arXiv:1508.04025, 2015. [Merity et al., 2016] Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models. arXiv preprint arXiv:1609.07843, 2016. [Mikolov et al., 2013] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013. [Moschini, 2016] Ilaria Moschini. The” face with tears of joy” emoji. a socio-semiotic and multimodal insight into a Japan-America mash-up. HERMES-Journal of Language and Communication in Business, (55):11–25, 2016. [Novak et al., 2015] Petra Kralj Novak, Jasmina Smailović, Borut Sluban, and Igor Mozetič. Sentiment of emojis. PloS one, 10(12):e0144296, 2015. [Peng et al., 2017] Haiyun Peng, Erik Cambria, and Amir Hussain. A review of sentiment analysis research in Chi- nese language. Cognitive Computation, 9(4):423–435, 2017. [Shen et al., 2014] Yelong Shen, Xiaodong He, Jianfeng Gao, Li Deng, and Grégoire Mesnil. Learning seman- tic representations using convolutional neural networks for web search. In Proceedings of the 23rd International Con- ference on World Wide Web, pages 373–374. ACM, 2014. 18