Emoji-Aware Attention-based Bi-directional GRU Network Model for Chinese
                               Sentiment Analysis
                          Da Li1 , Rafal Rzepka1,2 , Michal Ptaszynski3 and Kenji Araki1
                    1
                      Graduate School of Information Science and Technology, Hokkaido University
                                 2
                                   RIKEN Center for Advanced Intelligence Project (AIP)
                          3
                            Department of Computer Science, Kitami Institute of Technology
                         {lida, rzepka, araki}@ist.hokudai.ac.jp, ptaszynski@cs.kitami-it.ac.jp

                                 Abstract                         the year by The Oxford Dictionary [Moschini, 2016]. In our
                                                                  opinion, ignoring pictograms in sentiment research is unjus-
      Nowadays, social media has become the essential             tifiable, because they convey a significant emotional informa-
      part of our lives. Pictograms (emoticons/emojis)            tion and play an important role in expressing moods and opin-
      have been widely used in social media as a medium           ions in social media [Novak et al., 2015; Guibon et al., 2016;
      for visually expressing emotions. In this paper, we         Li et al., 2019].
      propose a emoji-aware attention-based GRU net-                  Furthermore, we also noticed that when people use emojis,
      work model for sentiment analysis of Weibo which            they tend to express a kind of humorous emotion which is dif-
      is the most popular Chinese social media platform.          ficult to be easily classified as positive or negative. It seems
      Firstly, we analyzed the usage of 67 emojis with            that some pictograms are used just for fun, self-mockery or
      facial expression. By performing a polarity anno-           jocosity which expresses an implicit humor which might be
      tation with a new “humorous type” added, we have            characteristic to Chinese culture. Figure 1 shows an example
      confirmed that 23 emojis can be considered more             of a Weibo microblog posted with emojis. In the third line of
      as humorous than positive or negative. On this ba-
                                                                  the post,         (ning meng ren1 ) is a new word that appeared
      sis, we applied the emojis polarity to a attention-
                                                                  in early 2019 on Chinese social media and means “lemon
      based GRU network model for sentiment analysis
      of undersized labelled data. Our experimental re-           man”. Accordingly, to address this new popular phrase,
      sults show that the proposed method can signifi-            was added to the pictogram repoitoare by social media com-
      cantly improve the performance for predicting sen-          panies in January 2019. This lemon with a sad face is also
      timent polarity on social media.                            called “lemon man” which expresses the same emotion as
                                                                  slang ning meng ren – “sour grapes” or “jealous of someone’s
                                                                  success”. This entry seems to convey a humorous nuance of
1    Introduction                                                 a pessimistic attitude. Emojis seem to play an important role
                                                                  in expressing this kind of emotions. There is a high possibil-
Today, many people share their lives with their friends by        ity that this phenomenon can cause a significant difficulty in
posting status updates on Facebook, sharing their holiday         sentiment recognition task.
photos on Instagram or tweeting their views via Twitter or            To address this phenomenon, in this paper we focus on
Weibo - the biggest Chinese social media network that was         the emojis used on Weibo in order to establish if pictograms
launched in 2009. Social media data contains a vast amount        improve sentiment analysis by recognizing humorous entries
of valuable sentiment information not only for the commer-        which are difficult to polarize. Because the emojis probably
cial use, but also for psychology, cognitive linguistics or po-   play an equal or sometimes even more important role in ex-
litical science [Li et al., 2018a].                               pressing emotion than textual features, we analyzed the char-
    Over the past decade, sentiment analysis of microblogs be-    acteristics of emojis, and report on their evaluation while di-
came an important area of research in the field of Natural        viding them into three categories: positive, negative and hu-
Language Processing. Study of sentiment in microblogs in          morous. We also noticed that among the resources of Chinese
English language has undergone major developments in re-          social media sentiment analysis, the labelled Weibo data sets
cent years [Peng et al., 2017]. Chinese sentiment analysis        containing emojis are extremely rare which makes consider-
research, on the other hand, is still at early stage [Wang et     ing them in machine learning approaches difficult. To resolve
al., 2013] especially when it comes to utilizing lexicons and     this problem, we propose a novel attention-based GRU net-
considering pictograms.                                           work model using emoji polarity to improve sentiment anal-
    Recently, emojis have emerged as a new and widespread         ysis on smaller annotated data sets. Our experimental results
aspect of digital communication, spanning diverse social net-     show that the proposed method can significantly improve the
works and spoken language. For example,             “face with
tears of joy” (an emoji that means that somebody is in an            1
                                                                       In this paper we use italic to indicate romanization of Chinese
extremely good mood) was regarded as the 2015 word of             language (pinyin).


 Copyright © 2019 for this paper by its authors.
 Use permitted under Creative Commons License
 Attribution 4.0 International (CC BY 4.0).

                                                                                                                              11
                                                                  morous. They have confirmed that 23 emojis can be consid-
                                                                  ered more as humorous than positive or negative. On this ba-
                                                                  sis, they used the emoji polarities (see Table 1) in a long short-
                                                                  term memory recurrent neural network (called EPLSTM) for
                                                                  sentiment analysis also on undersized labelled data. [Chen
                                                                  et al., 2018] proposed a novel scheme for Twitter sentiment
                                                                  analysis with extra attention on emojis. They first learned bi-
                                                                  polarity emoji embeddings under positive and negative sen-
                                                                  timental tweets individually, and trained a sentiment classi-
                                                                  fier by attending on these bi-polarity emoji embeddings with
                                                                  an attention-based long short-term memory network (LSTM).
                                                                  Their experiments shown that the bi-polarity embedding was
                                                                  effective for extracting sentiment-aware embeddings of emo-
                                                                  jis. However, humorous posts of social media were not con-
                                                                  sidered in their paper.
                                                                     An attention-based mechanism usually has been used to
                                                                  improve neural machine translation (NMT) by selectively fo-
    Figure 1: Example of Weibo post with “lemon man” emojis.      cusing on parts of the source sentence during translation. [Lu-
                                                                  ong et al., 2015] examined two simple and effective classes of
                                                                  attentional mechanism: a global approach which always uses
performance of sentiment polarity prediction.                     all source words and a local one that only looks at a subset of
                                                                  source words at a time. Their proposed model using different
2    Related Research                                             attention architectures has established a new state-of-the-art
Tan and Zhang conducted an empirical study of sentiment           result.
categorization on Chinese documents [Tan and Zhang, 2008].           Attention-based neural network has also been applied to
They tested four features – mutual information, information       classification task. Zhou and the others [Zhou et al., 2016]
gain, chi-square, and document frequency; and five learning       proposed attention-based bidirectional long short-term mem-
algorithms: centroid classifier, k-Nearest Neighbor, Winnow       ory networks (AttBLSTM) to capture the most important se-
classifier, Naı̈ve Bayes (NB) and Support Vector Machine          mantic information from a sentence. The experimental re-
(SVM). Their results showed that the information gain and         sults on the SemEval-2010 relation classification task have
SVM achieved the best results for sentiment classification        shown that their method outperforms most of the existing
coupled with domain or topic dependent classifiers. There are     methods. [Yang et al., 2016] proposed hierarchical attention
also researchers who have combined the machine learning ap-       networks (HAN) for classifying documents. Their model pro-
proach with the lexicon-based approach. [Chen et al., 2015]       gressively builds a document vector by aggregating important
proposed a novel sentiment classification method which in-        words into sentence vectors and then aggregating important
corporated existing Chinese sentiment lexicon and convolu-        sentences vectors to document vectors. Experimental results
tional neural network. The results showed that their approach     demonstrate that proposed model performs significantly bet-
outperforms the convolutional neural network (CNN) model          ter than previous methods. Results illustrate that this model
only with word embedding features [Kim, 2014]. However,           is effective in choosing out important words also in our study
all these approaches did not consider emojis.                     and we decided to adopt it.
   In 2017, Felbo and collegues [Felbo et al., 2017] proposed
a powerful system utilizing emojis in Twitter sentiment anal-
ysis model called DeepMoji. They trained 1,246 million
                                                                  3     Emoji-Aware Attention-based GRU
tweets containing at least one of 64 common emojis with                 Network Approach
Bi-directional Long Short-Term Memory (Bi-LSTM) model             Inspired by the above-mentioned works, in this paper, we ap-
and applied it to interpret the meaning behind the online mes-    plied emoji polarity to an attention-based bi-directional GRU
sages. DeepMoji is also one of the most advanced sarcasm-         network model (EAGRU, where “E” stands for Emojis) for
detecting models and irony reverses the emotion of the literal    sentiment classification of Weibo undersized labelled data.
text, therefore sarcasm-detecting capability can play a signif-   The architecture of the proposed method for sentiment clas-
icant role in sentiment analysis, especially in case of social    sification is shown in Figure 2.
media. Although sarcasm and irony tend to convey negative
emotions in general, we found that in Chinese social media
                                                                  3.1    GRU sequence encoder
(Weibo in our example), in addition to the expression of pos-
itive and negative emotions, people tend to express a kind of     The Gated Recurrent Unit [Bahdanau et al., 2014] is a gat-
humorous emotion that escapes the traditional bi-polarity.        ing mechanism to track the state of sequences without using
   In their research, [Li et al., 2018b] analyzed the usage of    separate memory cells. There are two types of gates: the re-
the emojis with facial expression used on Weibo. They asked       set gate rt and the update gate zt . They together control how
12 Chinese native speakers to label these emojis by applying      information is updated to the state. At time t, the GRU com-
one of three following categories: positive, negative and hu-     putes the new state as:


                                                                                                                            12
Table 1: Examples of emojis conveying humor typical for Chinese culture investigated by [Li et al., 2018b] and used in our work.

                                Emoji     Humorous {%}        Negative {%}       Positive {%}
                                                41.7                25.0              33.3
                                                58.3                0.0               41.7
                                                66.7                33.3              0.0
                                                91.7                8.3                0.0
                                                58.3                0.0               41.7
                                                83.3                0.0               16.7
                                                58.3                25.0              16.7
                                                66.7                 8.3              25.0
                                                66.7                8.3               25.0
                                                41.7                33.3              25.0
                                                75.0                25.0              0.0
                                                58.3                41.7              0.0
                                                50.0                50.0              0.0
                                                50.0                33.3              16.7
                                                75.0                 8.3              16.7
                                                58.3                33.3               8.3
                                                75.0                0.0               25.0


                                                                                  ht = (1 − zt )    ht−1 + zt      ht
                                                                                                                   e               (1)
                                                                     This is a linear interpolation between the previous state
                                                                  ht−1 and the current new state e  ht computed with new se-
                                                                  quence information. The gate zt decides how much past in-
                                                                  formation is kept and how much new information is added.
                                                                  zt is updated as:

                                                                                   zt = σ(Wz xt + Uz ht−1 + bz )                   (2)
                                                                     where xt is the sequence vector at time t. The candidate
                                                                        ht is computed in a way similar to a traditional recurrent
                                                                  state e
                                                                  neural network (RNN):

                                                                             ht = tanh(Wh xt + rt
                                                                             e                           (Uh ht−1 ) + bh )         (3)
                                                                     Here rt is the reset gate which controls how much the past
                                                                  state contributes to the candidate state. If rt is zero, then it
                                                                  forgets the previous state. The reset gate is updated as fol-
                                                                  lows:

                                                                                   rt = σ(Wr xt + Ur ht−1 + br )                   (4)
  Figure 2: The architecture of the proposed method.              3.2      Word attention
                                                                  Considering that the entries of Weibo are sentences of less
                                                                  than 140 words, in contrast to related work of [Yang et al.,
                                                                  2016], in our research we focus on sentence-level social me-
                                                                  dia sentiment classification. Assuming that a sentence si con-
                                                                  tains Ti words, wit with t ∈ [1, T ] represents the words in the
                                                                  ith sentence. Our proposed model projects a raw Weibo post
                                                                  into a vector representation, on which we build a classifier to


                                                                                                                             13
perform sentiment classification. In the below, we introduce        3.3       Emoji polarity
how we build the sentence level vector progressively from
word vectors by using the attention structure.                      In order to predict sentiment category of Weibo posts consid-
   Given a post with words wit , t ∈ [0, T ], we first vectorize    ering the influence of emojis for Chinese social media senti-
the words through an embedding matrix We , xij = We wij .           ment analysis, we assign a hyper-parameter λ1 to the proba-
We use a bidirectional GRU [Bahdanau et al., 2014] to ad-           bility of the deep learning model’s softmax output S(zi ). At
dress word annotations by summarizing information from              the same time, we apply the labelled emojis from the work of
both directions from words, and therefore incorporate the           [Li et al., 2018b] as polarity P e , and assign a hyper-parameter
contextual information in the annotation.     The bidirectional     λ2 . P becomes the final probability output of the classifica-
                                  →
                                  −                                 tion:
GRU contains the forward GRU f which reads the sentence
                                          ←
                                          −
si from wi1 to wiT and a backward GRU f which reads from
wiT to wi1 :                                                                                 P = λ1 S(zi ) + λ2 Pe                (13)

                   xit = We wit , t ∈ [1, T ]                (5)    where the summation of λ1 and λ2 is equal to 1.
                 →
                 −      −−−→                                         As a result, we can obtain the sentiment probability of a
                 h it = GRU (xit ), t ∈ [1, T ]              (6)
                                                                    Weibo post which considers the effect of emojis.
                   ←
                   −      ←−−−
                   h it = GRU (xit ), t ∈ [T, 1]              (7)
   We obtain an annotation for a given word wit by concate-         4       Experiments
                                    →
                                    −
nating the forward hidden state h it and backward hidden
      ←
      −                          →
                                 − ←     −                          In order to verify the validity of our proposed method, we
state h it , for example, hit = [ h it , h it ], which summarizes   performed series of experiments described below.
the information of the whole sentence centered around wit .
Not all words contribute equally to the representation of the
Weibo entry meaning. Hence, we introduce attention mecha-
                                                                    4.1       Preprocessing
nism to extract words which are important to the meaning of         Initializing word vectors with those obtained from an unsu-
the post and show how we calculate the total of the represen-       pervised neural language model is a popular method to im-
tation of those informative words to form a sentence vector.        prove performance in the absence of a large supervised train-
Specifically,                                                       ing set. For our experiment we collected a large dataset
                                                                    (7.6 million posts) from Weibo API from May 2015 to July
                  uit = tanh(Ww hit + bw )                   (8)    2017 to be used in calculating word embeddings. Firstly, we
                                                                    deleted images and videos, treating them as noise. Secondly,
                         exp(uTit uw )                              we used Python Chinese word segmentation module Jieba2 to
                 αit = P          T
                                                             (9)
                          t exp(uit uw )
                                                                    segment the sentences of the microblogs, and fed the segmen-
                             X                                      tation results into the word2vec model [Mikolov et al., 2013]
                      si =      αit hit                     (10)    for training word vectors. The vectors have dimensionality of
                               t                                    300 and were trained using the continuous skip-gram model.
   We first feed the word annotation hit through a one-layer           When we collected microblog data, we discovered that
MLP to get uit as a hidden representation of hit , then we          Weibo emojis are converted by API into textual tags, for ex-
measure the importance of the word as the similarity of uit
with a word level context vector uw and get a normalized            ample,      will be convert into          (“smile”). This pro-
importance weight αit through a softmax function. Sec-              vided us with the possibility of representing emojis in word
ondly, we compute the sentence vector si as a weighted sum          embedding. Therefore, we transformed the 109 Weibo emo-
of the word annotations based on the weights. The con-              jis (see Figure 3) into Chinese characters, and converted them
text vector uw can be seen as a high level representation           into textual features for word embedding. Several examples
of a fixed query the informative word over the words like           are shown in Table 2.
those used in memory networks [Sukhbaatar et al., 2015].               Next, we collected 4,000 Weibo posts containing ambigu-
The word context vector uw is randomly initialized and              ous eight emojis ( ,        , , , ,           , ,     ), ensur-
jointly learned during the training process. The outputs of         ing each entry has only one pictogram of a given type (cases
softmax layer S(zi ) are the probabilities of each category.        with more emojis of the same type were allowed). To use
The softmax function is defined as follows [Bridle, 1990;           these posts as our training data, we asked three Chinese na-
Merity et al., 2016]:                                               tive speakers to annotate them into three categories: “posi-
                                  ezi                               tive”, “negative”, and “humorous”. After one annotator la-
                      S(zi ) = Pi                       (11)        belled polarities of all posts, two other native speakers con-
                                       zj
                                 j=1 e
                                                                    firmed correctness of his annotations. Whenever there was
where the input of softmax layer zi is defined as:                  a disagreement, all decided the final polarity through discus-
                        zi = wi x + bi                  (12)        sion.
and where w is the weight and b is bias, both of them calcu-
                                                                        2
lated during the model training process.                                    https://github.com/fxsjy/jieba


                                                                                                                             14
                        Figure 3: 109 Weibo emojis which were converted into Chinese characters.


                                                              4.2 EAGRU Network
  Table 2: Examples of Textual Features of Emojis.
                                                              We trained our EAGRU model with 10 epochs and the per-
Emoji   Textual Feature      Emotion/Implication              formance achieved the highest value when the dropout rate
                                                              was 0.5. The validity of the model was examined by holdout
                                   “smile”                    method (90%/10%, training/validation). In general tanh was
                                  “applause”                  used as the activation function and softmax was the network
                            “face with tears of joy”          output activation function.
                                    “wink”
                                    “greedy”                  4.3   Baselines
                            “speechless/awkward”              We compare our EAGRU method with several baseline meth-
                                    “sweat”                   ods, including traditional deep learning approaches such as
                                                              convolutional neural network and long short-term memory re-
                                  “nosepick”                  current neural network.
                                    “snort”
                                                              Convolutional Neural Network
                             “upset/fell wronged”
                                   “pathetic”                 Convolutional neural networks (CNN) utilize layers with con-
                                                              volving filters that are applied to local features [LeCun et
                               “disappointment”               al., 1998]. Originally invented for computer vision, CNN
                                    “weep”                    models have subsequently shown to be effective for NLP and
                                                              have achieved superior results in semantic parsing [Yih et al.,
                                     “shy”                    2014], search query retrieval [Shen et al., 2014], sentence
                                    “filthy”                  modeling [Kalchbrenner et al., 2014], and other traditional
                                  “love face”                 NLP tasks.
                                                                 We experimented with the CNN architecture proposed in
                                  “kissy face”                [Kim, 2014] and applied our emoji polarities to this model.
                                     “leer”                      The CNN model considering Emoji Polarities (EPCNN)
                                 “lick screen”                was trained with 10 epochs and the dropout rate was 0.5 (the
                                  “dog leash”                 same as in the proposed method), the filter size was 32 and
                                                              number of strides was 2. As the activation functions, we used
                                 “smugshrug”                  RELU in general, and the network output activation function
                                                              was softmax.


                                                                                                                     15
Long Short-Term Memory Recurrent Neural Network
Long short-term memory recurrent neural network (LSTM)
[Hochreiter and Schmidhuber, 1997] is well-suited to classi-
fying, processing and making predictions based on time series
data, since there can be lags of unknown duration between
important events in a time series [Eyben et al., 2010].
   We utilized EPLSTM proposed in [Li et al., 2018b] trained
with 10 epochs and the dropout rate was 0.5 identical with
our proposed method. The validity of the model was exam-
ined by holdout method (90%/10%, training/validation). The
network output activation function was also softmax.
                                                                      Figure 4: Example of correct classification of humorous post.
4.4    Performance Test
Using a trained word2vec model, we passed word vectors of
training data into the three deep learning models for training.
We collected and annotated 180 Weibo entries with the eight
emojis mentioned above as a testing set, deleting images and
videos. Then we used the proposed method to calculate prob-
abilities of each category and confirmed the precision, recall
and F1-score. Because we assumed that in emotion expres-
sion emojis might play an equal or greater role than text, in
our experiment we set the hyper-parameters λ1 and λ2 to 0.4       Figure 5: Example of wrong classification into “positive” category.
and 0.6 respectively.
   We compared the results of sentiment classification by
deep learning approaches with and without considering emoji       expression is accompanied by        emoji, it improves the per-
polarities. Results of deep learning models without emojis are    formance of classification and predicts the implicit humorous
shown in Table 3. Table 4 introduces results of two traditional   meaning.
deep learning approaches where emoji polarities were con-            Error analysis showed that some posts were wrongly pre-
sidered, and the results of our proposed method. Table 5 de-      dicted due to ambiguous usage of emojis which brought
scribes the comparison of F1-scores of the above-mentioned        clearly negative impact on the results. In Figure 5 we show
methods.                                                          an example of such misclassification into “positive” cate-
   The results proved that our proposed method is more effec-     gory annotated as “humorous” by annotators.           was con-
tive than traditional neural network-based solutions. Limited     sidered as more positive than humorous by our annotators
to small annotated data, the precision of the sentiment clas-     (67%/0%/33%, positive/negative/humorous). It seems that
sification was relatively low, but thanks to considering emoji,   this particular user wrote a joke just for fun, however, our
the F1-score of each category outperformed previous meth-         proposed method was misguided by this “smirking” emoji.
ods without considering emojis by 6.93 (humorous), 7.41           Therefore, we plan to increase the number of evaluators for
(negative) and 7.19 (positive) percentage points. Our pro-        annotating Weibo emojis in fine-grained humorous emotion
posed emoji-aware attention-based GRU network approach            to enhance the reliability of the polarity of emojis.
has improved the performance showing that low-cost, small-
scale data labeling is sufficient to outperform widely used
state-of-the-art when emoji information is added to the deep      6    Conclusions and Future Work
learning process.                                                 In this paper, we applied information on sentiment of emo-
                                                                  jis to a attention-based GRU network model for sentiment
5     Discussion                                                  analysis of undersized labelled data. Our experimental results
                                                                  show that the proposed method can significantly improve the
In our proposed approach, we paid attention to emojis in mi-      F1-score for predicting sentiment polarity on Weibo.
croblogs and investigated how adding pictogram features to a         For improving the performance of our proposed method,
attention-based GRU network model for recognizing humor-          in the near future we are going to increase the amount of la-
ous posts which are problematic in sentiment analysis. Fig-       belled data to acquire the hyperparameters automatically by
ure 4 presents an example of a microblog which was correctly      machine learning approaches. Furthermore, we need to in-
classified by our proposed method as “humorous” while the         crease the number of evaluators for annotating Weibo emojis
baseline recognized it incorrectly as a positive one.             and Weibo data for more fine-grained categorization of hu-
   This and similar entries were usually posted as a comment      morous posts to enhance the reliability of our experiments.
a GIF or video showing a referee who displays her or his skills   We also plan to add image processing for classifying stickers
in basketball by performing a slam dunk. This post seems to       which also seem to convey rich emotional information.
express an implied humorous nuance of exaggerated surprise           Our ultimate goal is to investigate how much the newly
when the poster saw how good the referee was. Because this        introduced features are beneficial for sentiment analysis by


                                                                                                                              16
Table 3: Comparison results of three deep learning approaches not considering emojis (AttBiGRU stands for attention-based bi-directional
GRU).

                                   Categories     Evaluation     LSTM         CNN        AttBiGRU
                                                  Precision      63.46%      64.71%       77.78%
                                   Humorous         Recall       77.65%      77.65%       65.88%
                                                   F1-score      69.84%      70.59%       71.33%
                                                  Precision      62.79%      70.45%       70.83%
                                    Negative        Recall       61.36%      70.45%       77.27%
                                                   F1-score      62.07%      70.45%       73.91%
                                                  Precision      87.88%      88.23%       65.00%
                                     Positive       Recall       56.86%      58.82%       76.47%
                                                   F1-score      69.05%      70.58%       70.27%

                       Table 4: Comparison results of three deep learning approaches considering emoji polarities.

                                   Categories     Evaluation     EPLSTM        EPCNN       EAGRU
                                                  Precision       66.02%       69.52%       82.89%
                                   Humorous         Recall        80.00%       85.88%       74.12%
                                                   F1-score       72.34%       76.84%      78.26%*
                                                  Precision       65.91%       79.48%       78.72%
                                    Negative        Recall        65.91%       70.45%       84.09%
                                                   F1-score       65.91%       74.69%      81.32%*
                                                  Precision       90.91%       88.89%       73.68%
                                    Positive        Recall        58.82%       62.74%       82.35%
                                                   F1-score       71.43%       73.56%      77.77%*
                                                                                            *p < 0.05


Table 5: F-score comparison for deep learning approaches consid-      [Chen et al., 2015] Zhao Chen, Ruifeng Xu, Lin Gui, and
ering emoji polarities when compared to the best method not using       Qin. Lu. Combining convolution neural network and
pictograms (AttBiGRU).
                                                                        word sentiment sequence features for Chinese text senti-
                                                                        ment analysis. Journal of Chinese Information Processing,
                      Humorous Negative Positive                        2015.
         AttBiGRU      71.33% 73.91% 70.27%
                                                                      [Chen et al., 2018] Yuxiao Chen, Jianbo Yuan, Quanzeng
         EPLSTM        72.34% 65.91% 71.43%
                                                                        You, and Jiebo Luo. Twitter sentiment analysis via bi-
          EPCNN        76.84% 74.69% 73.56%
                                                                        sense emoji embedding and attention-based LSTM. In
          EAGRU        78.26% 81.32% 77.77%
                                                                        2018 ACM Multimedia Conference on Multimedia Con-
                                                                        ference, pages 117–125. ACM, 2018.
feeding them to a deep learning model which should allow              [Eyben et al., 2010] Florian Eyben, Martin Wöllmer, Alex
us to construct a high-quality sentiment recognizer for wider           Graves, Björn Schuller, Ellen Douglas-Cowie, and Roddy
spectrum of sentiment in Chinese language.                              Cowie. On-line emotion recognition in a 3-d activation-
                                                                        valence-time continuum using acoustic and linguistic cues.
7   Acknowledgment                                                      Journal on Multimodal User Interfaces, 3(1-2):7–19,
                                                                        2010.
This work was supported by JSPS KAKENHI Grant Number
17K00295.                                                             [Felbo et al., 2017] Bjarke Felbo, Alan Mislove, Anders
                                                                         Søgaard, Iyad Rahwan, and Sune Lehmann. Using mil-
References                                                               lions of emoji occurrences to learn any-domain representa-
                                                                         tions for detecting sentiment, emotion and sarcasm. arXiv
[Bahdanau et al., 2014] Dzmitry Bahdanau, Kyunghyun                      preprint arXiv:1708.00524, 2017.
  Cho, and Yoshua Bengio. Neural machine translation
  by jointly learning to align and translate. arXiv preprint          [Guibon et al., 2016] Gaël Guibon, Magalie Ochs, and
  arXiv:1409.0473, 2014.                                                Patrice Bellot. From emojis to sentiment analysis. In WA-
[Bridle, 1990] John S Bridle. Probabilistic interpretation of           CAI 2016, 2016.
  feedforward classification network outputs, with relation-          [Hochreiter and Schmidhuber, 1997] Sepp Hochreiter and
  ships to statistical pattern recognition. In Neurocomputing,          Jürgen Schmidhuber. Long short-term memory. Neural
  pages 227–236. Springer, 1990.                                        computation, 9(8):1735–1780, 1997.


                                                                                                                                17
[Kalchbrenner et al., 2014] Nal Kalchbrenner,         Edward     [Sukhbaatar et al., 2015] Sainbayar Sukhbaatar, Jason We-
   Grefenstette, and Phil Blunsom. A convolutional neu-             ston, Rob Fergus, et al. End-to-end memory networks. In
   ral network for modelling sentences. arXiv preprint              Advances in neural information processing systems, pages
   arXiv:1404.2188, 2014.                                           2440–2448, 2015.
[Kim, 2014] Yoon Kim. Convolutional neural networks for          [Tan and Zhang, 2008] Songbo Tan and Jin Zhang. An em-
   sentence classification. arXiv preprint arXiv:1408.5882,         pirical study of sentiment analysis for Chinese documents.
   2014.                                                            Expert Systems with applications, 34(4):2622–2629, 2008.
[LeCun et al., 1998] Yann LeCun, Léon Bottou, Yoshua            [Wang et al., 2013] Xinyu Wang, Chunhong Zhang, Yang Ji,
   Bengio, and Patrick Haffner. Gradient-based learning ap-         Li Sun, Leijia Wu, and Zhana Bao. A depression detection
   plied to document recognition. Proceedings of the IEEE,          model based on sentiment analysis in micro-blog social
   86(11):2278–2324, 1998.                                          network. In Pacific-Asia Conference on Knowledge Dis-
                                                                    covery and Data Mining, pages 201–213. Springer, 2013.
[Li et al., 2018a] Da Li, Rafal Rzepka, and Kenji Araki. Pre-
   liminary analysis of Weibo emojis for sentiment analysis      [Yang et al., 2016] Zichao Yang, Diyi Yang, Chris Dyer, Xi-
   of Chinese social media, proceedings. In The 32th Annual         aodong He, Alex Smola, and Eduard Hovy. Hierarchical
   Conference of the Japanese Society for Artificial Intelli-       attention networks for document classification. In Pro-
   gence, 2018.                                                     ceedings of the 2016 Conference of the North American
                                                                    Chapter of the Association for Computational Linguistics:
[Li et al., 2018b] Da Li, Rafal Rzepka, Michal Ptaszynski,          Human Language Technologies, pages 1480–1489, 2016.
   and Kenji Araki. Emoticon-aware recurrent neural net-
   work model for Chinese sentiment analysis. In The Ninth       [Yih et al., 2014] Wen-tau Yih, Xiaodong He, and Christo-
   IEEE International Conference on Awareness Science and           pher Meek. Semantic parsing for single-relation question
   Technology (iCAST 2018), 2018.                                   answering. In Proceedings of the 52nd Annual Meeting of
                                                                    the Association for Computational Linguistics (Volume 2:
[Li et al., 2019] Da Li, Rafal Rzepka, Michal Ptaszynski,           Short Papers), volume 2, pages 643–648, 2014.
   and Kenji Araki. A novel machine learning-based senti-
   ment analysis method for Chinese social media consider-       [Zhou et al., 2016] Peng Zhou, Wei Shi, Jun Tian, Zhenyu
   ing Chinese slang lexicon and emoticons. In The AAAI-            Qi, Bingchen Li, Hongwei Hao, and Bo Xu. Attention-
   19 Workshop on Affective Content Analysis, AffCon 2019,          based bidirectional long short-term memory networks for
   2019.                                                            relation classification. In Proceedings of the 54th An-
                                                                    nual Meeting of the Association for Computational Lin-
[Luong et al., 2015] Minh-Thang Luong, Hieu Pham, and               guistics (Volume 2: Short Papers), volume 2, pages 207–
   Christopher D Manning.           Effective approaches to         212, 2016.
   attention-based neural machine translation. arXiv preprint
   arXiv:1508.04025, 2015.
[Merity et al., 2016] Stephen Merity, Caiming Xiong, James
   Bradbury, and Richard Socher. Pointer sentinel mixture
   models. arXiv preprint arXiv:1609.07843, 2016.
[Mikolov et al., 2013] Tomas Mikolov, Kai Chen, Greg
   Corrado, and Jeffrey Dean.         Efficient estimation of
   word representations in vector space. arXiv preprint
   arXiv:1301.3781, 2013.
[Moschini, 2016] Ilaria Moschini. The” face with tears of
   joy” emoji. a socio-semiotic and multimodal insight into a
   Japan-America mash-up. HERMES-Journal of Language
   and Communication in Business, (55):11–25, 2016.
[Novak et al., 2015] Petra Kralj Novak, Jasmina Smailović,
   Borut Sluban, and Igor Mozetič. Sentiment of emojis.
   PloS one, 10(12):e0144296, 2015.
[Peng et al., 2017] Haiyun Peng, Erik Cambria, and Amir
   Hussain. A review of sentiment analysis research in Chi-
   nese language. Cognitive Computation, 9(4):423–435,
   2017.
[Shen et al., 2014] Yelong Shen, Xiaodong He, Jianfeng
   Gao, Li Deng, and Grégoire Mesnil. Learning seman-
   tic representations using convolutional neural networks for
   web search. In Proceedings of the 23rd International Con-
   ference on World Wide Web, pages 373–374. ACM, 2014.


                                                                                                                      18