=Paper= {{Paper |id=Vol-2130/paper3 |storemode=property |title=Learning Emoji Embeddings using Emoji Co-occurrence Network Graph |pdfUrl=https://ceur-ws.org/Vol-2130/paper3.pdf |volume=Vol-2130 |authors=Anurag Illendula,Manish Reddy Yedulla }} ==Learning Emoji Embeddings using Emoji Co-occurrence Network Graph== https://ceur-ws.org/Vol-2130/paper3.pdf
 Learning Emoji Embeddings using Emoji Co-occurrence
                   Network Graph

                     Anurag Illendula                                   Manish Reddy Yedulla
                Department Of Mathematics                         Department of Engineering Science
                      IIT Kharagpur                                        IIT Hyderabad
                  aianurag09@iitkgp.ac.in                             es15btech11012@iith.ac.in



                                                                   language and facial expressions during text conversa-
                                                                   tions. They are two-dimensional visual embodiments
                        Abstract                                   of everyday aspects of life which were standardized by
                                                                   the Unicode Consortium in 2010 as part of Unicode
    Usage of emoji in social media platforms has                   6.0. Emoji proliferated throughout the globe and has
    seen a rapid increase over the last few years.                 particularly become a part of the popular culture in
    Majority of the social media posts are laden                   the west. It has been adopted by almost all social me-
    with emoji and users often use more than                       dia platforms and messaging services. Emojis serve
    one emoji in a single social media post to ex-                 many purposes during online communication, among
    press their emotions and to emphasize certain                  which conveying emotion is one of the primary uses.
    words in a message. Utilizing the emoji co-                    According to the latest statistics released by Emo-
    occurrence can be helpful to understand how                    jipedia in June 2017, the number of emojis has in-
    emoji are used in social media posts and their                 creased to 2,666, posing challenges to applications that
    meanings in the context of social media posts.                 list them in small hand-held devices such as mobile
    In this paper, we investigate whether emoji co-                phones. To overcome this challenge, emoji keyboards
    occurrences can be used as a feature to learn                  in most of the smartphones contains categorizes emoji
    emoji embeddings which can be used in many                     into several categories listed in Table 1.
    downstream applications such sentiment anal-
    ysis and emotion identification in social me-                     Many recent Natural Language Processing (NLP)
    dia text. We utilize 147 million tweets which                  systems rely on word representations in finite-
    have emojis in them and build an emoji co-                     dimensional vector space.       These NLP systems
    occurrence network. Then, we train a net-                      mainly use pre-trained word embeddings obtained
    work embedding model to embed emojis into                      from word2vec [MSC+ 13] or GloVe [PSM14] or fast-
    a low dimensional vector space. We evalu-                      Text [BGJM16]. Earlier GloVe embeddings were used
    ate our embeddings using sentiment analysis                    for training most NLP systems, but fastText trained
    and emoji similarity experiments, and experi-                  word embeddings could achieve much higher accura-
    mental results show that our embeddings out-                   cies of NLP systems involving social media data be-
    perform the current state-of-the-art results for               cause the fastText model could learn sub-word in-
    sentiment analysis tasks.                                      formation. Emoji embeddings have been of funda-
                                                                   mental importance to improve the accuracies of many
                                                                   emoji understanding tasks. Recent research proved
1    Introduction
                                                                   that emoji embeddings could enhance the performance
Emojis are the 21st century's successor to the emoti-              of emoji prediction [FMS+ 17, BBS17], emoji simi-
con. They arose from the need to communicate body                  larity [WBSD17b], and emoji sense disambiguation
                                                                   tasks [WBSD17a, SPWT17]. These emoji represen-
Copyright c 2018 held by the author(s). Copying permitted for      tations have also been efficient in understanding the
private and academic purposes.
                                                                   behavior of emojis in different contexts. The need to
In: S. Wijeratne, E. Kiciman, H. Saggion, A. Sheth (eds.): Pro-
ceedings of the 1st International Workshop on Emoji Under-
                                                                   learn emoji representations for improving the perfor-
standing and Applications in Social Media (Emoji2018), Stan-       mance of social NLP systems has been recognized by
ford, CA, USA, 25-JUN-2018, published at http://ceur-ws.org        Eisner et al. [ERA+ 16] and Francesco et al. [BRS16]
among others, where they used traditional approaches
which include skip-gram and CBOW model to learn
emoji embeddings.
   Information networks such as publication networks,
World Wide Web are characterized by the interplay
between various content and a sophisticated under-
lying knowledge structure. Graph embedding mod-
els are helpful to scale information from large-scale
information networks and embed them into a finite-
dimensional vector space, and these embeddings have
shown great success in various NLP tasks such as node      Figure 1: Distribution of Tweets across various emojis
classification [BCM11], link prediction [LNK07] and        in lakhs
classification [YRS+ 14] tasks. These graph embedding                                 .
models have been of crucial importance and have en-
hanced the performance of word similarity and word         learn emoji representations from emoji co-occurrence
analogical reasoning tasks using language networks         network graph. Section 5 reports the accuracies ob-
[TQW+ 15]. The analysis of emoji co-occurrence net-        tained by our emoji embeddings on the gold-standard
work graphs can help us understand emojis from dif-        dataset for sentiment analysis task, emoji similarity
ferent perspectives. We hypothesize that emojis which      tasks. We discuss the reason behind high accuracies
co-occur in a tweet contains the same sentiment as the     obtained for sentiment analysis task in Section 6 fol-
overall sentiment of the tweet. Consider a tweet, “I got   lowed by plans for future work in Section 7.
betrayed by , I want to kill you ”, here both the
emojis , contain negative sentiment, the overall           2   Related Work
sentiment of the tweet is also negative. Hence we in-
vestigate whether emoji co-occurrence could be a bet-      One of the exciting work by Wijeratne
ter feature to learn emoji representations to improve      et al.            [WBSD16, WBSD17a] in the
the accuracy of classification tasks. In this paper, we    field of emoji understanding is EmojiNet
introduce an approach to learn emoji representations       (http://emojinet.knoesis.org/home.php), the largest
using emoji co-occurrence network graph and large-         machine readable emoji sense inventory, this inventory
scale information network embedding model and eval-        helps computers understand emojis. In this work
uate our embeddings using the gold-standard dataset        Wijeratne et al. tried to connect emojis and their
for sentiment analysis task.                               senses to corresponding words in babelnet ([NP12])
                                                           using their respective babelnetId. EmojiNet opened
              Table 1: Emoji Categories                    doors to many of the emoji understanding tasks
                                                           like emoji similarity, emoji prediction, emoji sense
            Category         Emoji Examples                disambiguation.
           Smiley and                                          The other interesting work done by Wijeratne et
                                    , ,
             People                                        al. [WBSD17b] addressed the challenge of measuring
          Animals and                                      emoji similarity using the semantics of emoji. They
                                    , ,
             Nature                                        defined two types of semantics embeddings using the
         Food and Drink             , ,                    textual senses and the textual descriptions of emojis.
            Activity                , ,                    Prior work by Francesco et al. ([BRS16]) and Eisner
           Travel and                                      et al. ([ERA+ 16]) used traditional approaches to learn
                                    , ,                    emoji embeddings. The semantic embeddings have
             Places
            Objects                 , ,                    achieved accuracies which outperformed the previous
                                                           state-of-the-art results in sentiment analysis task; this
            Symbols                 , ,
                                                           high accuracy is due to the fact that semantic embed-
             Flags                  , ,                    dings can learn syntactic, semantic, sentiment features
                                                           of emojis.
   This paper is organized as follows. Section 2 dis-          Seyednezhad et al. ([SM17]) created a network us-
cusses the related work done by other researchers in       ing the emoji co-occurrences in the same tweet; they
the field of emoji understanding and learning network      claim that each edge weight can help us understand
representations. Section 3 discusses the process of cre-   the user’s context to use multiple emojis. This emoji
ating an emoji co-occurrence network using our twitter     network also enabled them to justify the use of co-
corpus. Section 4 explains our model architecture to       occurred emojis in different perceptions. This also en-
                                                                                                     Table 2: Most frequently co-occurring emoji pairs

      I got betrayed by a       I would die for                                                                                     No of
                                                                                                              Emoji Pair
                                                                                                                                Co-occurrences
                                                                                                                (   ,   )          230957
                                   I can’t trust so its Hi & Bye flow,   I still   them though
                                                                                                                (   ,   )          196970
                                                                                                                (   ,   )          135595
                                                                                                                (   ,   )          102612
       Baby text back                  , waiting for your reply
                                                                                                                (   ,   )          102408


    Without      respect there is no
                                                                                                     Table 3: Least frequently co-occurring emoji pairs

                                                                                                                                    No of
                                                                                                              Emoji Pair
         Figure 2: Construction of Emoji polygons                                                                               Co-occurrences
                            .                                                                                   (   ,   )             1
abled them to understand emoji usage by understand-                                                             (   ,   )             1
ing possible relations between these special characters                                                         (   ,   )             1
in common text. Fede et al. ([FHSM17]) studied dif-                                                             (   ,   )             1
ferent characteristics of this emoji co-occurrence net-                                                         (   ,   )             1
work graph which include studying user’s behavior to
use a sequence of emojis in different contexts.
   Information networks have been of primary use to                                              ing to these two emojis is considered as 2. Similarly,
store large amounts of information. Many researchers                                             the weight of all the edges in the emoji network is cal-
have proposed different graph embedding models in                                                culated.
machine learning literature which allow us to embed                                                 The emoji co-occurrence network created using the
nodes of large information networks into a low dimen-                                            tweets in Figure 2 is represented in Figure 3. We input
sional vector space ([PARS14] [GL16] [CLX15]). These                                             the emoji co-occurrence network graph to our graph
embeddings helped address many tasks such as node                                                embedding model to learn 300-dimensional emoji em-
classification, visualization, and link prediction tasks.                                        beddings, and we evaluate our embeddings using the
                                                                                                 gold-standard dataset for sentiment analysis. We use
                                                                                                 the gold-standard dataset ([NSSM15]) to evaluate our
3     Data and Network                                                                           embeddings because the current state-of-the-art re-
The emoji network is constructed using a twitter cor-                                            sults [ERA+ 16] for sentiment analysis were obtained
pus of 147 million tweets crawled through a period of 2                                          on this dataset.
months (from 6th August 2016 to 8th September 2016)
by Wijeratne et al. [WBSD17a]. We filter the tweets                                              4     Model
and only consider the tweets which have multiple emo-
                                                                                                 4.1     Description
jis embedded in a tweet. This reduces the number of
distinct tweets in the dataset to 14.3 million. Fig-                                             Here we discuss two different types of measures which
ure 1 shows the distribution of the number of tweets                                             signify the proximity between two nodes of the co-
of the most frequently occurring emojis. Each tweet                                              occurrence network graph, and the model developed
generates a polygon of n sides where n is the number                                             by Jian et al. [TQW+ 15] to learn the node represen-
of emojis embedded in the tweet. The construction                                                tations of a network graph.
of emoji network is straightforward and Figure 2 ex-
plains the construction of emoji polygons with the help                                             First Order Proximity : The first order proxim-
of different examples.                                                                           ity is defined as the local pairwise proximity which can
                                                                                                 be related to the weight of the edge formed by joining
   The weight of an edge signifies the number of co-                                             the two vertices. The first order proximity between
occurrences of the emojis sharing the edge considering                                           an edge (u,v) is the weight Wuv of the edge formed
the complete twitter corpus. For example in the case                                             by vertices u, v. It can also be inferred from the defi-
of tweets shown in Figure 2 the emoji pair ( , ) ap-                                             nition that the first-order proximity between any two
peared twice hence the weight of the edge correspond-                                            non-connected vertices is zero.
                                                                                                 where d(pe1 (i, j), p1 (i, j)) is defined as the distance
                                                                                              between the two probability distributions. Replacing
                                 Tweet 1                                                      d(·, ·) by KL-divergence, the objective function reduces
                                                                                              to
                                             Tweet 3

                                                                                                                                X
                                                                                                                O1 = −                  O1 (i, j)              (4)
             Tweet 4
                                                                                                                              (i,j)∈E
                                 Tweet 2
                                                                                                                        X
                                                                                                           O1 = −                wij log p1 (vi , vj )         (5)
                                                             Negative sentiment tweet
                                                             Positive sentiment tweet                                  (i,j)∈E


                                                                                              4.1.2   Network embedding using second order
        Figure 3: Construction of Emoji Network                                                       proximity:
                           .
                                                                                              The second order proximity of two nodes (vi , vj ) mea-
                                                                                              sures the similarity of the neighbourhood network
   Second Order Proximity : The second or-                                                    structures of nodes (vi , vj ). This measure is applica-
der proximity is defined as the similarity between                                            ble for both directed and undirected graphs. Hence
neighbourhood network structures.                  For exam-                                  our objective, in this case, is to look at the vertex and
ple, consider u to be an emoji node, let pu =                                                 the “context” of the vertex which can also be related
(w(u,1) , w(u,2) , ......., w(u,|V |) ) denote the first order                                to the distribution of neighbours of the given vertex.
proximity of the emoji node “u” with all the vertices                                         Hence for each edge (vi , vj ) the probability of “con-
then the second order proximity is defined as the sim-                                        text” is defined by
ilarity between pu and pv . If there exists no common
vertex between u and v, then second-order proximity                                                                                        T
                                                                                                                                  exp(u~0 j · ~ui )
is zero.                                                                                                  p2 (vj |vi ) = P                                     (6)
                                                                                                                                 |V |    ~0 T ui )
                                                                                                                                 k=1 exp(u k · ~
4.1.1    Network embedding using first order                                                     Where |V | is the number of vertices. As mentioned
         proximity:                                                                           before, the second order proximity assumes that ver-
Let ui and uj represent the network embedding in d                                            tices with similar distribution over the contexts as sim-
dimensional vector space, where (i,j) is an undirected                                        ilar vertices. To maintain the second order proximity,
edge in the network graph. The joint probability which                                        the similarity distance between the contexts p2 (·|vi )
signifies the proximity between vertices vi , vj is defined                                   represented in the low dimensional vector space and
as                                                                                            the empirical distribution pe2 (·|j) must be optimized.
                                                                                              Hence our objective function (O2 ) in this case is
                                             1
                p1 (vi , vj ) =                                                         (1)                           X
                                      1 + exp(~uTi · ~uj )                                                O2 =              λi d(pe2 (·|vi ), p2 (·|vi ))      (7)
                       d                                                                                          vi ∈V
   where u~i ∈ R is a low dimensional representation
also called as embedding for emoji node vi , wij repre-                                          where d(·, ·) is the distance between two probability
sents the weight of the edge between the nodes vi and                                         distributions, here the variable λi is used to consider
vj The probability distribution between different pair                                        the importance of the vertex vi during the process of
of vertices is defined as p(.,.) over the vector space V                                      optimization. As defined in the previous case the em-
x V and the empirical probability is defined as pe1 (i, j)                                    pirical distribution is defined as

                           wij                          X                                                             wij                        X
          pe1 (i, j) =               and    W =                  wij                    (2)            pe2 (i, j) =            and      di =             wik   (8)
                           W                                                                                          di
                                                       (i,j)∈E                                                                                 k∈N (i)

   To maintain the first Order proximity between the                                             wij is the weight of edge (vi , vj ) and di is defined as
vertices of the network graph, the objective function                                         the out-degree of vertex and N(i) is the set of neigh-
(O1 ) which is the distance between the empirical prob-                                       bours of vi . Considering λi = di for the purpose of
ability function and the proximity function is to be                                          simplicity, and replacing d(·, ·) with KL-divergence
optimized.
                                                                                                                          X
                                                                                                           O2 = −                 wij log p2 (vj |vi )         (9)
                 O1 (i, j) = d(pe1 (i, j), p1 (i, j))                                   (3)                            (i,j)∈E
4.2     Model Optimization                                                   which contain emoji. In both training and testing sets,
                                                                             29% are labelled as positive, 25% are labelled as neg-
The approach of negative sampling proposed by
                                                                             ative, and 46% are labelled as neural. We use the
Mikolov et al. [MSC+ 13] is used to optimize the objec-
                                                                             pre-trained FastText word embeddings3 [MGB+ 18] to
tive function which helps us to represent every vertex
                                                                             embed words into a low dimensional vector space. We
of the network graph in the low dimensional vector
                                                                             calculate the bag of words vector for each tweet and
space. Hence the objective function simplifies to:
                                                                             then use this vector as a feature to train a support vec-
                                                                             tor machine and a random forest model on the training
             T
                             K
                             X                           T
                                                                             set, and evaluate the accuracies obtained for classifica-
    log σ(u~0 j · ~ui )) +         Evn Pn (v) [log σ(u~0 j · ~ui )]   (10)   tion task on whole testing dataset consisting of 12920
                             i=1                                             tweets. The accuracies obtained for classification task
                                                                             using the first order embeddings surpass the current
   where σ(x) = 1/(1 + exp(−x)) is the sigmoid func-                         state-of-the-art [WBSD17b] results.
tion. We use the stochastic gradient descent algorithm
[RRWN11] for optimizing the objective function and
                                                                                   Table 4: Accuracy of Sentiment Analysis task
we update the model parameters on a batch of edges.
Thus after completion of the training process, we get
                                                                                                    Classification    Classification
the embeddings corresponding to each vertex. The                                     Word
                                                                                                      accuracy          accuracy
gradient with respect to an embedding u~i of vertex vi                             Embeddings
                                                                                                      using RF         using SVM
will be:
                                                                                   State-of-the-
                  ∂O1           ∂ log(p1 )(vi , vj )                                    art              60.7               63.6
                        = wij .                                       (11)            results
                  ∂ u~i               ∂ u~i
                                                                                   First Order
                                                                                                         62.1              65.2
                  ∂O2           ∂ log(p2 )(vj |vi )                                Embedding
                        = wij .                                       (12)
                  ∂ u~i               ∂ u~i                                           Second
                                                                                      Order              58.7               61.9
   We learn the node embeddings (u~i ) by optimizing
                                                                                   Embedding
the objective function in both cases and call the em-
beddings as first order embeddings and second order
embeddings respectively.The model is trained using
the Tensorflow ([ABC+ 16]) library on a cuda GPU.                            5.2    Emoji Similarity
Model is trained using RMS Propagation gradient de-
                                                                             Emoji similarity4 is one of the important challenges
scent algorithm with learning rate as 0.025, and we
                                                                             which should be addressed for the development of
used a batch size as 128, the number of batches =
                                                                             emoji keyboards since the current emoji keyboard con-
300000 and 300-dimensional embeddings. The code
                                                                             sists of 2666 emojis, and the complete list cannot be
is made available on Github1 , 300-dimensional emoji
                                                                             accommodated in a small screen. These emoji embed-
embeddings learned using the emoji co-occurrence net-
                                                                             dings learned using the emoji co-occurrence network
work can also be accessed at this link.
                                                                             graph could be helpful to calculate the similarity be-
                                                                             tween emojis using cosine distance as the similarity
5     Experiments                                                            measure and group emojis which have high similarity
5.1     Sentiment Analysis                                                   values. This grouping of emojis can decrease the num-
                                                                             ber of distinct emojis and helps us accommodate this
In this section, we report our accuracies obtained                           grouped emojis on a small screen. In this section, we
for the sentiment analysis task on the gold-standard                         report the emoji similarity values found considering
dataset developed by Novak et al. [NSSM15]. Our ex-                          the first order embeddings and second order embed-
periments have achieved accuracies which outperform                          dings.
the current state-of-the-art results for sentiment anal-                        We consider the cosine distance to be the similarity
ysis on the gold-standard dataset. The gold-standard                         measure between two embeddings. Let ~a and ~b be two
dataset2 consists of 64599 manually labelled tweets                          vectors which represent embeddings of emojis e1 and
classified into positive, negative, neutral sentiment.                       e2 respectively, the similarity measure between these
The dataset is divided into training set that consists                       two emojis (e1 and e2 ) is calculated as
51679 tweets, 9405 out of which contain emoji and
                                                                               3 https://bit.ly/2FMTB4N
testing set that consists of 12920 tweets, 2295 out of
                                                                                4 Our main objective is not to address the emoji similarity
    1 https://bit.ly/2I5hYNd                                                 task. Our main objective is to demonstrate the usefulness of
    2 https://bit.ly/2pLaKVZ                                                 our emoji embeddings for sentiment analysis task.
                                                                       Table 7: Spearman's Rank Correlation Results
                                           ~a · ~b
                 similarity(e1 , e2 ) =                     (13)
                                          |a| · |b|                             Emoji Embeddings            ρ ∗ 100
    Table 5 and Table 6 reports the most similar emo-                          First Order Embeddings         74
jis found considering the first order embeddings and                          Second Order Embedding          66
second order embeddings respectively.The observed re-
sults are explained in Section 6.                                  analogy, ( :      ) = ( : ?), we fill the gap (repre-
                                                                   sented by “?”) by finding an emoji from the complete
Table 5: Emoji Similarity Measured using first order               list of emojis whose embedding(represented by vec(x))
embeddings                                                         is closest to vec( ) - vec( ) + vec( ). Table 8 re-
                                                                   ports some of the interesting analogies found using first
                          Similarity           Semantic            order and second order embeddings.
      Emoji Pair
                           Measure             Similarity
         (   ,   )          0.921                0.442             Table 8: Emoji to Emoji Analogical Reasoning using
         (   ,   )          0.916                0.598             Emoji Embeddings
         (   ,   )          0.911                0.623
         (   ,   )          0.909                0.546                                            Second Emoji
                                                                             First Emoji Pair
         (   ,   )          0.856                0.723                                                Pair
         (   ,   )          0.889                0.702                            (   :   )         ( : )
         (   ,   )          0.881                0.737                            (   :   )         ( : )
                                                                                  (   :   )         ( : )
                                                                                  (   :   )         ( : )
                                                                                  (   :   )         ( : )
5.3     Analogical Reasoning
The analogical reasoning task introduced by Mikolov
et al. [MSC+ 13], defines the syntactic and semantic               6     Discussion
analogies. For example, consider the semantic analogy
such as USA : Washington = India : ? where we                      The high accuracy for classification task using the first
fill the gap (represented by “?”) by finding a word                order embedding model is due to the fact that all co-
from the vocabulary whose embedding(represented by                 occurring emojis in a tweet possess the same sentiment
vec(x)) is closest to vec(Washington) - vec(USA) +                 feature, hence during classification these embeddings
vec(India). Here cosine distance is considered as the              would increase the accuracy of the classification model.
similarity measure between the two vectors.                        Consider the tweet, “Who uses this emoji , I miss
                                                                   the one that had this mouth       and these eyes     ! ...
Table 6: Emoji Similarity Measured using second or-                where did he go?! Why did he leave?!” , in this tweet
der embeddings                                                     we observe the overall sentiment to be positive, and
                                                                   we also observe that all the emojis embedded in the
                          Similarity           Semantic            tweet possess the same sentiment. Hence co-occurring
      Emoji Pair                                                   emojis would be better attribute to learn emoji em-
                           Measure             Similarity
                                                                   beddings which can increase the accuracy of sentiment
         (   ,   )          0.646                0.662
                                                                   analysis and other related classification tasks.
         (   ,   )          0.606                0.598
                                                                       We use the Spearman's rank correlation coefficient
         (   ,   )          0.596                0.623             to evaluate the emoji similarity ranks obtained using
         (   ,   )          0.556                0.622             first order and second order embeddings learned using
         (   ,   )          0.546                0.916             emoji co-occurrence network with the emoji similar-
         (   ,   )          0.540                0.945             ity ranks of gold-standard dataset5 . Table 7 reports
                                                                   the Spearman's correlation coefficient obtained by our
                                                                   emoji embeddings. According to the correlation coef-
                                                                   ficients the first emoji embeddings show a strong cor-
5.3.1    Emoji to emoji analogy                                    relation (0.6 < ρ < 0.79).
We extrapolate the semantic analogy task introduced                    The top 6 most similar emoji pairs observed consid-
by Mikolov et al. [MSC+ 13] in the context of emojis,              ering the first order embeddings are reported in Table
by replacing words with emojis. Consider an emoji                      5 https://bit.ly/2GztSR2
5. As we see from Table 5 the most similar emoji pair             References
observed is ( , ) with similarity measure of 0.921.
                                                                  [ABC+ 16]   Martı́n Abadi, Paul Barham, Jianmin
The usage of the first emoji        would be in the con-                      Chen, Zhifeng Chen, Andy Davis, Jef-
text where the user wishes to express his dis-concern                         frey Dean, Matthieu Devin, Sanjay Ghe-
over certain issue through an act of hitting, the usage                       mawat, Geoffrey Irving, Michael Isard,
of the other emoji        would be in the context where                       et al. Tensorflow: A system for large-scale
the user wishes to express his dis-concern over certain                       machine learning. In OSDI, volume 16,
issue through an expression of uneasiness. Hence the                          pages 265–283, 2016.
high similarity measure has sound even if we consider
the context of use of the emojis. The results show that           [BBS17]     Francesco Barbieri, Miguel Ballesteros,
our embeddings give higher similarity measures than                           and Horacio Saggion.        Are emo-
the semantic similarity6 measure.                                             jis predictable?        arXiv preprint
   The top 6 most similar emoji pairs observed con-                           arXiv:1702.07285, 2017.
sidering the second order embeddings are reported in
                                                                  [BCM11]     Smriti Bhagat, Graham Cormode, and
Table 6. As we see from Table 6 the most similar
                                                                              S Muthukrishnan. Node classification in
emoji pair observer is ( , ) with similarity measure
                                                                              social networks. In Social network data
of 0.586. The usage of the first emoji        would be in                     analytics, pages 115–148. Springer, 2011.
the context where the user wishes to generate a sound
or ring a bell or in the context of celebration, the usage        [BGJM16]    Piotr Bojanowski, Edouard Grave,
of the second emoji       would be in the context of cel-                     Armand Joulin, and Tomas Mikolov.
ebration. EmojiNet lists “celebration” as a sense form                        Enriching word vectors with sub-
for both the emojis, hence the observed similarity has                        word information.       arXiv preprint
sound even if we consider the context of use of this                          arXiv:1607.04606, 2016.
emojis.
                                                                  [BGL14]     Jiang Bian, Bin Gao, and Tie-Yan Liu.
                                                                              Knowledge-powered deep learning for
7    Future Work                                                              word embedding. In Joint European Con-
Usage of external knowledge has improved the ac-                              ference on Machine Learning and Knowl-
curacies of various natural language processing tasks                         edge Discovery in Databases, pages 132–
and outperformed many state-of-the-art results. Jian                          148. Springer, 2014.
et al. [BGL14] have worked on leveraging external                 [BRS16]     Francesco Barbieri, Francesco Ronzano,
knowledge in learning word embeddings which gave                              and Horacio Saggion. What does this
better accuracies in word similarity and word anal-                           emoji mean? a vector space skip-gram
ogy tasks. The first set of examples in EmoSim5087                            model for twitter emojis. In LREC, 2016.
dataset look more convincing than the results in Table
5 and Table 6; the reason being semantic knowledge                [CLX15]     Shaosheng Cao, Wei Lu, and Qiongkai
helps to us compare the similarity between different                          Xu. Grarep: Learning graph representa-
emojis efficiently. Using Jian et al.'s work as a refer-                      tions with global structural information.
ence, we could work on incorporating external knowl-                          In Proceedings of the 24th ACM Interna-
edge from EmojiNet to our network embedding model                             tional on Conference on Information and
which might further improve the accuracies of senti-                          Knowledge Management, pages 891–900.
ment analysis and emoji similarity tasks.                                     ACM, 2015.
                                                                  [ERA+ 16]   Ben Eisner, Tim Rocktäschel, Isabelle
Acknowledgement                                                               Augenstein, Matko Bošnjak, and Sebas-
We are grateful to Sanjaya Wijeratne and Amit Sheth                           tian Riedel. emoji2vec: Learning emoji
for thought-provoking discussions on the topic. We ac-                        representations from their description.
knowledge support from the Indian Institute of Tech-                          arXiv preprint arXiv:1609.08359, 2016.
nology Kharagpur. Any opinions, findings, and con-                [FHSM17]    Halley Fede, Isaiah Herrera, SM Mahdi
clusions/recommendations expressed in this material                           Seyednezhad, and Ronaldo Menezes.
are those of the author(s) and do not necessarily reflect                     Representing emoji usage using directed
the views of Indian Institute of Technology Kharagpur.                        networks: A twitter case study. In In-
   6 The semantic similarity is the similarity measure obtained               ternational Workshop on Complex Net-
using semantic embeddings developed by Wijeratne et al.                       works and their Applications, pages 829–
   7 https://bit.ly/2GztSR2                                                   842. Springer, 2017.
[FMS+ 17]   Bjarke Felbo, Alan Mislove, Anders            [PSM14]     Jeffrey Pennington, Richard Socher, and
            Søgaard, Iyad Rahwan, and Sune                            Christopher Manning. Glove: Global vec-
            Lehmann.     Using millions of emoji                      tors for word representation. In Proceed-
            occurrences to learn any-domain rep-                      ings of the 2014 conference on empiri-
            resentations for detecting sentiment,                     cal methods in natural language process-
            emotion and sarcasm. arXiv preprint                       ing (EMNLP), pages 1532–1543, 2014.
            arXiv:1708.00524, 2017.
                                                          [RRWN11] Benjamin Recht,        Christopher Re,
[GL16]      Aditya Grover and Jure Leskovec.                       Stephen Wright, and Feng Niu. Hogwild:
            node2vec: Scalable feature learning for                A lock-free approach to parallelizing
            networks. In Proceedings of the 22nd                   stochastic gradient descent. In Advances
            ACM SIGKDD international confer-                       in neural information processing systems,
            ence on Knowledge discovery and data                   pages 693–701, 2011.
            mining, pages 855–864. ACM, 2016.
                                                          [SM17]      SM Mahdi Seyednezhad and Ronaldo
[LNK07]     David Liben-Nowell and Jon Kleinberg.                     Menezes. Understanding subject-based
            The link-prediction problem for social                    emoji usage using network science. In
            networks. journal of the Association                      Workshop on Complex Networks Com-
            for Information Science and Technology,                   pleNet, pages 151–159. Springer, 2017.
            58(7):1019–1031, 2007.
                                                          [SPWT17]    Amit Sheth, Sujan Perera, San-
                                                                      jaya Wijeratne, and Krishnaprasad
[MGB+ 18]   Tomas Mikolov, Edouard Grave, Piotr
                                                                      Thirunarayan. Knowledge will propel
            Bojanowski, Christian Puhrsch, and Ar-
                                                                      machine understanding of content:
            mand Joulin. Advances in pre-training
                                                                      Extrapolating from current examples.
            distributed word representations. In Pro-
                                                                      In Proceedings of the International
            ceedings of the International Conference
                                                                      Conference on Web Intelligence, Leipzig,
            on Language Resources and Evaluation
                                                                      Germany, August 23-26, 2017, pages
            (LREC 2018), 2018.
                                                                      1–9, 2017.
[MSC+ 13]   Tomas Mikolov, Ilya Sutskever, Kai
                                                          [TQW+ 15] Jian Tang, Meng Qu, Mingzhe Wang,
            Chen, Greg S Corrado, and Jeff Dean.
                                                                    Ming Zhang, Jun Yan, and Qiaozhu
            Distributed representations of words and
                                                                    Mei. Line: Large-scale information net-
            phrases and their compositionality. In
                                                                    work embedding. In Proceedings of the
            Advances in neural information process-
                                                                    24th International Conference on World
            ing systems, pages 3111–3119, 2013.
                                                                    Wide Web, pages 1067–1077. Interna-
                                                                    tional World Wide Web Conferences
[NP12]      Roberto Navigli and Simone Paolo                        Steering Committee, 2015.
            Ponzetto. BabelNet: The automatic con-
            struction, evaluation and application of a    [WBSD16]    Sanjaya Wijeratne, Lakshika Balasuriya,
            wide-coverage multilingual semantic net-                  Amit Sheth, and Derek Doran. Emojinet:
            work. Artificial Intelligence, 193:217–250,               Building a machine readable sense inven-
            2012.                                                     tory for emoji. In International Confer-
                                                                      ence on Social Informatics, pages 527–
[NSSM15]    Petra Kralj Novak, Jasmina Smailović,                    541. Springer, 2016.
            Borut Sluban, and Igor Mozetič.
            Sentiment of emojis.      PloS one,           [WBSD17a] Sanjaya Wijeratne, Lakshika Balasuriya,
            10(12):e0144296, 2015.                                  Amit Sheth, and Derek Doran. Emo-
                                                                    jinet: An open service and api for emoji
[PARS14]    Bryan Perozzi, Rami Al-Rfou, and                        sense discovery. In 11th International
            Steven Skiena. Deepwalk: Online learn-                  AAAI Conference on Web and Social Me-
            ing of social representations. In Proceed-              dia (ICWSM), pages 437–446, Montreal,
            ings of the 20th ACM SIGKDD interna-                    Canada, May 2017.
            tional conference on Knowledge discovery
            and data mining, pages 701–710. ACM,          [WBSD17b] Sanjaya Wijeratne, Lakshika Balasuriya,
            2014.                                                   Amit P. Sheth, and Derek Doran. A
            semantics-based measure of emoji sim-
            ilarity. In Proceedings of the Interna-
            tional Conference on Web Intelligence,
            Leipzig, Germany, August 23-26, 2017,
            pages 646–653, 2017.
[YRS+ 14]   Xiao Yu, Xiang Ren, Yizhou Sun, Quan-
            quan Gu, Bradley Sturt, Urvashi Khan-
            delwal, Brandon Norick, and Jiawei Han.
            Personalized entity recommendation: A
            heterogeneous information network ap-
            proach. In Proceedings of the 7th ACM
            international conference on Web search
            and data mining, pages 283–292. ACM,
            2014.