Distant Supervision for Emotion Classification Task
                     using emoji2emotion

        Aisulu Rakhmetullina                             Dietrich Trautmann                        Georg Groh
          Informatics Dept.                               Informatics Dept.                     Informatics Dept.
           Garching, 85748                                 Garching, 85748                       Garching, 85748
    aisulu.rakhmetullina@tum.de                    dietrich.trautmann@cs.tum.edu                grohg@in.tum.de
                                               Technical University of Munich


                                                                  we apply it to emoji to emotion mapping. Since to
                                                                  our knowledge there is no such experimentally created
                        Abstract                                  mapping between them, we introduce the name for it
                                                                  - emoji2emotion.
    Increasing number of research in the area
                                                                     There exist different emotion classification models,
    of distant supervision for emotion detection
                                                                  either discrete or dimensional. In this work we have
    task requires a reliable mapping between noisy
                                                                  chosen Plutchik’s wheel of emotions [Plu91] that com-
    labels and emotion classes. We propose a
                                                                  bine characteristics of both discrete and dimensional
    method for an experimental creation of such
                                                                  models. We use main 8 emotions out of it called
    a reliable mapping based on manually an-
                                                                  Plutchik’s eight (anger, anticipation, joy, trust, fear,
    notated data and quantitative relations be-
                                                                  surprise, sadness and disgust) that shown in Figure 1.
    tween labels and classes on example of emoji-
                                                                  .
    emotion pair in a form of emoji2emotion map-
    ping.

1    Introduction
The japanese word emoji means “picture + charac-
ter”, and has no semantical connection to english emo-
tion as you might have thought. However, emojis in-
deed very often carry the emotional state of the writer.
That is why, no surprises that as a part of the digital
text emojis were exploited in various NLP researches
related to sentiment analysis or emotion classification.
   In later works based on machine learning ap-
proaches, most of the time emojis are used as a noisy
label for a distant supervision task. However, the
matching between emoji and sentiment or emotion
class is often done manually [WR16]. That approach
implies subjectivity and could lead to mismatching.
The goal of this work is to propose a method for ex-
perimental matching between emoji and classes that
should be more reliable. To evaluate our method

Copyright c 2018 held by the author(s). Copying permitted for
private and academic purposes.
In: S. Wijeratne, E. Kiciman, H. Saggion, A. Sheth (eds.): Pro-   Figure 1: Plutchiks Wheel of Emotions with Plutchik’s
ceedings of the 1st International Workshop on Emoji Under-
standing and Applications in Social Media (Emoji2018), Stan-      Eight highlighted [Plu91]
ford, CA, USA, 25-JUN-2018, published at http://ceur-ws.org
2   Related Work                                            3.1   Data Acquisition
                                                            The first step of an emoji containing tweets corpus
One of the first attempts to characterize emoji from        creation is to choose the list of emojis. To select most
its sentimental load perspective was a project called       popular emojis in the twitter and in general in text on-
Emoji Sentiment Ranking - the first emoji sentiment         line, we looked into the Emojitracker [etr13] project as
lexicon (Figure 2). It was created by [NSSM15] and          well as into Emoji Sentiment Ranking table [NSSM15].
provides a map between 751 most frequently used emo-        By application of threshold for each ranking (>100 000
jis and sentiments. The valuable insights from it that      000 for Emojitracker and >100 for Emoji Sentiment
we use: the majority of emojis are positive, especially     Ranking) 31 emojis from the first list and 50 from the
the top popular ones; among tweets with emojis, the         second was picked. We selected emojis that were the
inter-annotator agreement tend to be higher.                intersection of both lists and additionally handpicked
   In [ERA+ 16] authors release emoji2vec, set of pre-      some emojis that were in the top lists but not in the
trained embeddings for all emojis in Unicode learned        intersection one. That is how the set of 43 tweets was
from emojisdescription taken from Unicode emoji             created. After that, we calculated the distribution per-
standard. That is one of the examples of mapping            centages for each source and found the average. That
emojis to another forms that are compatible to incor-       average percentage was used to create the same natu-
porate into machine learning tasks. And in general,         ral balance in our corpus.
representation learning and usage of pre-trained word           The second step of corpus creation is a collection
embeddings is popular among natural language pro-           of data using the results of the previous step. In this
cessing applications focused on social media.               paper, we use easily accessible Twitter data that we
   In several works [BFMP13], [HBF+ 15], [JLL+ 14],         crawl with help of tweepy library. As a result, 84777
[KZM14] emoticons were used to create a lexicon for a       tweets containing emojis were crawled. Turned out,
later use in a knowledge-based approach for sentiment       the vast majority of them (92.3%) contains only one
analysis or emotion detection. The common thing be-         type of emoji and most of the time its quantity is equal
tween these works is a utilization of a high number of      to 1 (average emoji count per tweet is 1.2). That is
emoticon types, usually hundreds. Later works based         why we decided to focus on single emoji type tweets
on machine learning approach in contrast to works in        and after filtering out tweets with multiple emoji types
the previous paragraph use emoticons and emojis as          or with emoji types that are not in our emoji list, the
noisy labels for distant supervision tasks. Such works      74670 tweets left for training purposes.
are [Rea05], [GBH09], [DTR10], [ZDWX12].                        The last step in the creation of corpus for labelling
                                                            is a tweets preprocessing. On this stage, the raw tweets
   The recent paper [FMS+ 17] presents a project            downloaded in the previous step are processed to the
called DeepMoji and shows that diversification of noisy     ready tweets . To do so, the number of emoji types in a
labels set for the distant supervision allows models to     tweet is counted, as well as the number of occurrences
learn richer representations. They obtained state-of-       per each emoji type present. The replacement of tags,
the-art performance on the 8 benchmark datasets ac-         hashtags and URLs by the placeholders is done.
cording to sentiment, emotion and sarcasm detection,
which proves the effectiveness of the noisy level ap-       3.2   Data Annotation
proach. Furthermore, their analyses confirm the as-
sumption that diversity of emotional labels results in a    In order to start annotation process, we picked 500
performance improvement comparing to previous dis-          tweets with additional requirement in order to enhance
tant supervision methods.                                   the quality of tweets to be annotated. The require-
                                                            ments were:

                                                              • Tweet does not contain URL-s, TAG-s. That is
3   Data Acquisition and Annotation                             a common practice in NLP that allows to exclude
                                                                meaningless parts of the text.
In this section, the process of manually annotated cor-       • Tweet does not contain HASHTAGS. Even
pus creation is described in detail. First, an acquisi-         though [DTR10] found hashtags useful for auto-
tion of data for further annotation is explained in three       mated sentiment analysis, in our case we decided
steps: emoji list creation, tweets crawling and tweets          to eliminate them in order to increase the read-
preprocessing. Second, the annotation course is pre-            ability for annotators.
sented in another three steps: tweets filtering, anno-
tation and averaging of vectors, analysis of resulting        • Tweet contains from 5 to 15 words. That way
corpus.                                                         we have not so short and not so long tweets.
    • Tweet contains no more than 2 uppercase
      words. That is also for readability reasons.

    • Tweet contains no unlemmatizeable words
      (using spacys lemmatizer). Here it serves the data
      purity purposes as well as the understandability
      of the text for annotators.

    • Tweet contains no certain keywords (the list
      was manually generated after revision of corpus)
      in order to eliminate spam tweets.

    After choosing these 500 tweets, 3 annotators were
asked to go through the procedure of tweets evalua-
tion using an Web Interface created by us. For each
tweet they could choose arbitrary number of emotions
(including none) out of Plutchik’s Eight and set the
intensity value for it from 1 to 3. The resulting labels
were averaged according to rule of where more than
half of annotators should agree on label.
    The resulting corpus consists of 500 labeled tweets,
where labels are vectors of size 8 containing emotion
intensities for 8 emotions. In the annotated set nearly
half of tweets has only one emotion type, and the other
half the combination of them (up to 4 out of 8 at once),
resulting 1.1 emotion per tweet in average. The most
prevalent emotion was joy that appeared in 57% of
tweets to some extent. Other emotions were not that
spread, and appeared in a quarter or less of tweets
each.
    In Table 1 the statistics of emotion and emotion
combination distributions over the dataset is pre-
sented. For clarity emotions and emotion combina-
tions are grouped into the positive, negative and neu-
tral groups. Here the grouping was made under the
assumption that emotions joy and trust are positive;
sadness, anger, disgust, and fear are negative; and no
emotions(neutral), anticipation and surprise are neu-
tral. The combinations were determined by the pre-
vailing sentiment, and in case of equality of positive
and negative emotions, it was grouped into the neu-
tral category.
    The macro distribution shows that tweets with pos-
itive emotions are prevailing with about 60%, while
the negative and neutral emotion tweets are only the
rest. That is predicted that positive tweets will appear
more (as stated in [NSSM15]), however, a distribution
of classes is quite imbalanced.

4     Mapping emoji2emotion
Using annotated dataset from the previous step the
percentage of emoji occurrences per emotion and vice
verse was done. In order to create a mapping, we
checked each possible pair of emotions and emojis for
the following two conditions. First, emojis percentage     Table 1: Distribution of positive, neutral and negative
                                                           emotions across the resulting corpus
of appearing in the tweets subset of certain emotion
should be at least equal to the median value of pos-
sible percentages. Second, an emotion should appear
in certain emoji tweets at least half of the time. As
a result, the following mapping was done as shown in
Table 3.


     Table 2: Results of emoji2emotion mapping

   To evaluate the quality of mapping, we use them as
noisy labels in emotion annotation subtask of SemEval
2007 task 14 - Affective text [sem07]. That task aims
to explore the connection between emotions and lexical
semantics. Since the task is carried out in an unsuper-
vised setting, only testing data is provided. It consists
of 1000 short texts (news headlines) annotated accord-
ing to 6 emotions (Anger, Disgust, Fear, Happiness,
Sadness, Surprise) which are Ekman’s Six, and their
intensity. Due to the fact that 6 emotions of Plutchik’s
Eight compose Ekman’s Six, this data could be com-
patible with ours. For that, we reduce the number of
classes from 8 to 6 and labelled 74670 tweets from Data     Table 3: Results of applying emoji2emotion to task 14
Acquisition step using emoji2emotion mapping to use         of SemEval 2017 [sem07]
as training data. We used coarse version of SemEval’s
test set as well as labelled our training set with binary   5   Findings and Contribution
vectors.
   To train our model we turned the news headlines in       We propose a method of experimental mapping be-
test set as well as tweet texts in training set into word   tween emoji and sentiment or emotion classes based on
embeddings using the word2vec methodology and open          a special processing of manually annotated data. The
source code of emoji2vec. Then we fed these word em-        processing includes the finding quantitate relation be-
beddings as well as noisy labels to 4 classifiers (SGD,     tween emoji and emotion in form of cooccurrence per-
Naive Bayes, Random Forest and k-NN) from the               centage and further thresholding. To implement the
scikit library. Using the trained model we predicted        method we annotated the corpus of 500 tweets con-
emotion categories per headline for the 1000 test set       taining emojis with help of 3 human judges. From the
mentioned before. The resulting precision, recall and       average annotation labels we constructed mapping as
f1 scores are presented in the Table 3. The bold val-       described above. Due to significant imbalance in emo-
ues represent maximum values, while green values are        tions distribution across the dataset mapping was done
those that outperform the SemEval’s best scores.            only for 4 emotion categories and evaluated by exploit-
   That is evident that the training data has a imbal-      ing as noisy labels for emotion detection task on those
ance towards certain emotion categories which we link       4 emotions. The results on emotion detection task
to the number of emojis picked per emotion. That is         show that it is feasible to continue in that direction by
why the results of training also translate that kind of     increasing the size of the annotated corpus and further
bias. To avoid that bias we need a more balanced set,       tuning the training parameters.
and for that, in turn, we need more balanced mapping.          The resulting corpus of manually labeled emoji
To achieve that, more training data will be needed in       containing tweets is shared open source online
the next run of the experiment and we leave that for        (https://github.com/Aisulu/emoji2emotion) for the
future development of the work.                             benefits of scientific society.
6    Challenges and Limitations                               [etr13]     Emojitracker, 2013.
After the annotation process that was evident to us,                +
                                                              [FMS 17] Bjarke Felbo, Alan Mislove, Anders
that the labelling for 8 classes and 3 intensity levels for            Søgaard, Iyad Rahwan, and Sune
each of them require the high cognitive load from the                  Lehmann.       Using millions of emoji
annotators and in average takes 18 seconds per tweet.                  occurrences to learn any-domain represen-
Even though we knew that the increase in the class                     tations for detecting sentiment, emotion
numbers leads to the slower labelling [BKT+ 13], it was                and sarcasm. In Conference on Empirical
higher than we expected and lead to the decrease of the                Methods in Natural Language Processing
final corpus size. As a result, not all the emotions were              (EMNLP), 2017.
presented in large enough size in the dataset which
leads to the convergence of the classes to fewer classes.     [GBH09]     Alec Go, Richa Bhayani, and Lei Huang.
                                                                          Twitter sentiment classification using dis-
                                                                          tant supervision. 150, 01 2009.
7    Future Work
                                                              [HBF+ 15] Alexander Hogenboom, Danella Bal, Flav-
We aim to find less time-consuming form of annotation
                                                                        ius Frasincar, Malissa Bal, Franciska
process for users to increase the size of the manually
                                                                        De Jong, and Uzay Kaymak. Exploiting
annotated corpus. After that we plan to repeat exper-
                                                                        emoticons in polarity classification of text.
imental procedures.
                                                                        J. Web Eng., 14(1-2):22–40, March 2015.
References                                                    [JLL+ 14]   Fei Jiang, Yiqun Liu, Huanbo Luan, Min
                                                                          Zhang, and Shaoping Ma.          Microblog
[BFMP13] Marina Boia, Boi Faltings, Claudiu-
                                                                          Sentiment Analysis with Emoticon Space
         Cristian Musat, and Pearl Pu. A: ) is
                                                                          Model, pages 76–87. Springer Berlin Hei-
         worth a thousand words: How people at-
                                                                          delberg, Berlin, Heidelberg, 2014.
         tach sentiment to emoticons and words in
         tweets. In Proceedings of the 2013 Inter-            [KZM14]     Svetlana Kiritchenko, Xiaodan Zhu, and
         national Conference on Social Computing,                         Saif M. Mohammad. Sentiment analysis
         SOCIALCOM ’13, pages 345–350, Wash-                              of short informal texts. J. Artif. Int. Res.,
         ington, DC, USA, 2013. IEEE Computer                             50(1):723–762, May 2014.
         Society.
                                                              [NSSM15] Petra Kralj Novak, Jasmina Smailovic,
[BKT+ 13] Michael Brooks,        Katie Kuksenok,                       Borut Sluban, and Igor Mozetic. Senti-
          Megan K. Torkildson, Daniel Perry,                           ment of emojis. 2015.
          John J. Robinson, Taylor J. Scott, Ona
                                                              [Plu91]     R. Plutchik. The Emotions. University
          Anicello, Ariana Zukowski, Paul Harris,
                                                                          Press of America, 1991.
          and Cecilia R. Aragon. Statistical af-
          fect detection in collaborative chat. In            [Rea05]     Jonathon Read. Using emoticons to re-
          Proceedings of the 2013 Conference on                           duce dependency in machine learning tech-
          Computer Supported Cooperative Work,                            niques for sentiment classification. In
          CSCW ’13, pages 317–328, New York,                              Proceedings of the ACL Student Research
          NY, USA, 2013. ACM.                                             Workshop, ACLstudent ’05, pages 43–48,
                                                                          Stroudsburg, PA, USA, 2005. Association
[DTR10]      Dmitry Davidov, Oren Tsur, and Ari                           for Computational Linguistics.
             Rappoport. Enhanced sentiment learn-
             ing using twitter hashtags and smileys.          [sem07]     Affective text. semeval task 14, 2007.
             In Proceedings of the 23rd International
                                                              [WR16]      I. D. Wood and S. Ruder. Emoji as emo-
             Conference on Computational Linguistics:
                                                                          tion tags for tweets. Proceedings of the
             Posters, COLING ’10, pages 241–249,
                                                                          Emotion and Sentiment Analysis Work-
             Stroudsburg, PA, USA, 2010. Association
                                                                          shop LREC2016, Portoro, Slovenia, pages
             for Computational Linguistics.
                                                                          76–79, 2016.
[ERA+ 16] Ben Eisner, Tim Rocktäschel, Isabelle Au-          [ZDWX12] Jichang Zhao, Li Dong, Junjie Wu, and
          genstein, Matko Bosnjak, and Sebastian                       Ke Xu. Moodlens: An emoticon-based
          Riedel. emoji2vec: Learning emoji repre-                     sentiment analysis system for chinese
          sentations from their description. CoRR,                     tweets. 08 2012.
          abs/1609.08359, 2016.