<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Learning Emo ji Embeddings using Emo ji Co-occurrence Network Graph</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Anurag Illendula</string-name>
          <email>aianurag09@iitkgp.ac.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manish Reddy Yedulla</string-name>
          <email>es15btech11012@iith.ac.in</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department Of Mathematics</institution>
          ,
          <addr-line>IIT Kharagpur</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Engineering Science</institution>
          ,
          <addr-line>IIT Hyderabad</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>23</fpage>
      <lpage>26</lpage>
      <abstract>
        <p>Usage of emoji in social media platforms has seen a rapid increase over the last few years. Majority of the social media posts are laden with emoji and users often use more than one emoji in a single social media post to express their emotions and to emphasize certain words in a message. Utilizing the emoji cooccurrence can be helpful to understand how emoji are used in social media posts and their meanings in the context of social media posts. In this paper, we investigate whether emoji cooccurrences can be used as a feature to learn emoji embeddings which can be used in many downstream applications such sentiment analysis and emotion identi cation in social media text. We utilize 147 million tweets which have emojis in them and build an emoji cooccurrence network. Then, we train a network embedding model to embed emojis into a low dimensional vector space. We evaluate our embeddings using sentiment analysis and emoji similarity experiments, and experimental results show that our embeddings outperform the current state-of-the-art results for sentiment analysis tasks.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Emojis are the 21st century s successor to the
emoticon. They arose from the need to communicate body
language and facial expressions during text
conversations. They are two-dimensional visual embodiments
of everyday aspects of life which were standardized by
the Unicode Consortium in 2010 as part of Unicode
6.0. Emoji proliferated throughout the globe and has
particularly become a part of the popular culture in
the west. It has been adopted by almost all social
media platforms and messaging services. Emojis serve
many purposes during online communication, among
which conveying emotion is one of the primary uses.
According to the latest statistics released by
Emojipedia in June 2017, the number of emojis has
increased to 2,666, posing challenges to applications that
list them in small hand-held devices such as mobile
phones. To overcome this challenge, emoji keyboards
in most of the smartphones contains categorizes emoji
into several categories listed in Table 1.</p>
      <p>Many recent Natural Language Processing (NLP)
systems rely on word representations in
nitedimensional vector space. These NLP systems
mainly use pre-trained word embeddings obtained
from word2vec [MSC+13] or GloVe [PSM14] or
fastText [BGJM16]. Earlier GloVe embeddings were used
for training most NLP systems, but fastText trained
word embeddings could achieve much higher
accuracies of NLP systems involving social media data
because the fastText model could learn sub-word
information. Emoji embeddings have been of
fundamental importance to improve the accuracies of many
emoji understanding tasks. Recent research proved
that emoji embeddings could enhance the performance
of emoji prediction [FMS+17, BBS17], emoji
similarity [WBSD17b], and emoji sense disambiguation
tasks [WBSD17a, SPWT17]. These emoji
representations have also been e cient in understanding the
behavior of emojis in di erent contexts. The need to
learn emoji representations for improving the
performance of social NLP systems has been recognized by
Eisner et al. [ERA+16] and Francesco et al. [BRS16]
among others, where they used traditional approaches
which include skip-gram and CBOW model to learn
emoji embeddings.</p>
      <p>Information networks such as publication networks,
World Wide Web are characterized by the interplay
between various content and a sophisticated
underlying knowledge structure. Graph embedding
models are helpful to scale information from large-scale
information networks and embed them into a
nitedimensional vector space, and these embeddings have
shown great success in various NLP tasks such as node
classi cation [BCM11], link prediction [LNK07] and
classi cation [YRS+14] tasks. These graph embedding
models have been of crucial importance and have
enhanced the performance of word similarity and word
analogical reasoning tasks using language networks
[TQW+15]. The analysis of emoji co-occurrence
network graphs can help us understand emojis from
different perspectives. We hypothesize that emojis which
co-occur in a tweet contains the same sentiment as the
overall sentiment of the tweet. Consider a tweet, \I got
betrayed by , I want to kill you ", here both the
emojis , contain negative sentiment, the overall
sentiment of the tweet is also negative. Hence we
investigate whether emoji co-occurrence could be a
better feature to learn emoji representations to improve
the accuracy of classi cation tasks. In this paper, we
introduce an approach to learn emoji representations
using emoji co-occurrence network graph and
largescale information network embedding model and
evaluate our embeddings using the gold-standard dataset
for sentiment analysis task.
, ,
, ,
, ,
, ,
, ,
, ,
, ,</p>
      <p>This paper is organized as follows. Section 2
discusses the related work done by other researchers in
the eld of emoji understanding and learning network
representations. Section 3 discusses the process of
creating an emoji co-occurrence network using our twitter
corpus. Section 4 explains our model architecture to
learn emoji representations from emoji co-occurrence
network graph. Section 5 reports the accuracies
obtained by our emoji embeddings on the gold-standard
dataset for sentiment analysis task, emoji similarity
tasks. We discuss the reason behind high accuracies
obtained for sentiment analysis task in Section 6
followed by plans for future work in Section 7.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>One of the exciting work by Wijeratne
et al. [WBSD16, WBSD17a] in the
eld of emoji understanding is EmojiNet
(http://emojinet.knoesis.org/home.php), the largest
machine readable emoji sense inventory, this inventory
helps computers understand emojis. In this work
Wijeratne et al. tried to connect emojis and their
senses to corresponding words in babelnet ([NP12])
using their respective babelnetId. EmojiNet opened
doors to many of the emoji understanding tasks
like emoji similarity, emoji prediction, emoji sense
disambiguation.</p>
      <p>The other interesting work done by Wijeratne et
al. [WBSD17b] addressed the challenge of measuring
emoji similarity using the semantics of emoji. They
de ned two types of semantics embeddings using the
textual senses and the textual descriptions of emojis.
Prior work by Francesco et al. ([BRS16]) and Eisner
et al. ([ERA+16]) used traditional approaches to learn
emoji embeddings. The semantic embeddings have
achieved accuracies which outperformed the previous
state-of-the-art results in sentiment analysis task; this
high accuracy is due to the fact that semantic
embeddings can learn syntactic, semantic, sentiment features
of emojis.</p>
      <p>Seyednezhad et al. ([SM17]) created a network
using the emoji co-occurrences in the same tweet; they
claim that each edge weight can help us understand
the user's context to use multiple emojis. This emoji
network also enabled them to justify the use of
cooccurred emojis in di erent perceptions. This also
enabled them to understand emoji usage by
understanding possible relations between these special characters
in common text. Fede et al. ([FHSM17]) studied
different characteristics of this emoji co-occurrence
network graph which include studying user's behavior to
use a sequence of emojis in di erent contexts.</p>
      <p>Information networks have been of primary use to
store large amounts of information. Many researchers
have proposed di erent graph embedding models in
machine learning literature which allow us to embed
nodes of large information networks into a low
dimensional vector space ([PARS14] [GL16] [CLX15]). These
embeddings helped address many tasks such as node
classi cation, visualization, and link prediction tasks.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Data and Network</title>
      <p>The emoji network is constructed using a twitter
corpus of 147 million tweets crawled through a period of 2
months (from 6th August 2016 to 8th September 2016)
by Wijeratne et al. [WBSD17a]. We lter the tweets
and only consider the tweets which have multiple
emojis embedded in a tweet. This reduces the number of
distinct tweets in the dataset to 14.3 million.
Figure 1 shows the distribution of the number of tweets
of the most frequently occurring emojis. Each tweet
generates a polygon of n sides where n is the number
of emojis embedded in the tweet. The construction
of emoji network is straightforward and Figure 2
explains the construction of emoji polygons with the help
of di erent examples.</p>
      <p>The weight of an edge signi es the number of
cooccurrences of the emojis sharing the edge considering
the complete twitter corpus. For example in the case
of tweets shown in Figure 2 the emoji pair ( , )
appeared twice hence the weight of the edge
corresponding to these two emojis is considered as 2. Similarly,
the weight of all the edges in the emoji network is
calculated.</p>
      <p>The emoji co-occurrence network created using the
tweets in Figure 2 is represented in Figure 3. We input
the emoji co-occurrence network graph to our graph
embedding model to learn 300-dimensional emoji
embeddings, and we evaluate our embeddings using the
gold-standard dataset for sentiment analysis. We use
the gold-standard dataset ([NSSM15]) to evaluate our
embeddings because the current state-of-the-art
results [ERA+16] for sentiment analysis were obtained
on this dataset.
Here we discuss two di erent types of measures which
signify the proximity between two nodes of the
cooccurrence network graph, and the model developed
by Jian et al. [TQW+15] to learn the node
representations of a network graph.</p>
      <p>First Order Proximity : The rst order
proximity is de ned as the local pairwise proximity which can
be related to the weight of the edge formed by joining
the two vertices. The rst order proximity between
an edge (u,v) is the weight Wuv of the edge formed
by vertices u, v. It can also be inferred from the de
nition that the rst-order proximity between any two
non-connected vertices is zero.</p>
      <p>Tweet 4</p>
      <p>Tweet 1
Tweet 2</p>
      <p>Tweet 3</p>
      <p>Negative sentiment tweet
Positive sentiment tweet</p>
      <p>Second Order Proximity : The second
order proximity is de ned as the similarity between
neighbourhood network structures. For
example, consider u to be an emoji node, let pu =
(w(u;1); w(u;2); :::::::; w(u;jV j)) denote the rst order
proximity of the emoji node \u" with all the vertices
then the second order proximity is de ned as the
similarity between pu and pv. If there exists no common
vertex between u and v, then second-order proximity
is zero.
4.1.1</p>
      <sec id="sec-3-1">
        <title>Network embedding using proximity: rst order</title>
        <p>Let ui and uj represent the network embedding in d
dimensional vector space, where (i,j) is an undirected
edge in the network graph. The joint probability which
signi es the proximity between vertices vi, vj is de ned
as
p1(vi; vj) =</p>
        <p>1
1 + exp(~uiT ~uj)
(1)
where u~i 2 Rd is a low dimensional representation
also called as embedding for emoji node vi, wij
represents the weight of the edge between the nodes vi and
vj The probability distribution between di erent pair
of vertices is de ned as p(.,.) over the vector space V
x V and the empirical probability is de ned as pe1(i; j)
pe1(i; j) =
wij
W
and</p>
        <p>W =
wij</p>
        <p>(2)</p>
        <p>X
(i;j)2E</p>
        <p>To maintain the rst Order proximity between the
vertices of the network graph, the objective function
(O1) which is the distance between the empirical
probability function and the proximity function is to be
optimized.</p>
        <p>O1(i; j) = d(pe1(i; j); p1(i; j))
(3)
(4)
(5)
(6)
where d(pe1(i; j); p1(i; j)) is de ned as the distance
between the two probability distributions. Replacing
d( ; ) by KL-divergence, the objective function reduces
to</p>
        <p>O1 =</p>
        <p>O1(i; j)</p>
        <p>X
(i;j)2E
O1 =</p>
        <p>wij log p1(vi; vj)</p>
        <p>X
(i;j)2E
4.1.2</p>
        <p>Network embedding using second order
proximity:
The second order proximity of two nodes (vi; vj)
measures the similarity of the neighbourhood network
structures of nodes (vi; vj). This measure is
applicable for both directed and undirected graphs. Hence
our objective, in this case, is to look at the vertex and
the \context" of the vertex which can also be related
to the distribution of neighbours of the given vertex.
Hence for each edge (vi; vj) the probability of
\context" is de ned by
p2(vjjvi) =</p>
        <p>exp(u~0jT ~ui)</p>
        <p>PjkV=j1 exp(u~0kT ~ui)</p>
        <p>Where jV j is the number of vertices. As mentioned
before, the second order proximity assumes that
vertices with similar distribution over the contexts as
similar vertices. To maintain the second order proximity,
the similarity distance between the contexts p2( jvi)
represented in the low dimensional vector space and
the empirical distribution p2( jj) must be optimized.
e
Hence our objective function (O2) in this case is
O2 = X
vi2V
id(pe2( jvi); p2( jvi))
(7)
where d( ; ) is the distance between two probability
distributions, here the variable i is used to consider
the importance of the vertex vi during the process of
optimization. As de ned in the previous case the
empirical distribution is de ned as
pe2(i; j) =
wij
di
and di =
wik</p>
        <p>(8)</p>
        <p>X
k2N(i)
wij is the weight of edge (vi; vj) and di is de ned as
the out-degree of vertex and N(i) is the set of
neighbours of vi. Considering i = di for the purpose of
simplicity, and replacing d( ; ) with KL-divergence
O2 =
wij log p2(vjjvi)
(9)</p>
        <p>X
(i;j)2E
The approach of negative sampling proposed by
Mikolov et al. [MSC+13] is used to optimize the
objective function which helps us to represent every vertex
of the network graph in the low dimensional vector
space. Hence the objective function simpli es to:
log (u~0jT
~ui)) +</p>
        <p>K
X EvnPn(v)[log (u~0jT
i=1
~ui)] (10)
where (x) = 1=(1 + exp( x)) is the sigmoid
function. We use the stochastic gradient descent algorithm
[RRWN11] for optimizing the objective function and
we update the model parameters on a batch of edges.
Thus after completion of the training process, we get
the embeddings corresponding to each vertex. The
gradient with respect to an embedding u~i of vertex vi
will be:
(11)
(12)</p>
        <p>We learn the node embeddings (u~i) by optimizing
the objective function in both cases and call the
embeddings as rst order embeddings and second order
embeddings respectively.The model is trained using
the Tensor ow ([ABC+16]) library on a cuda GPU.
Model is trained using RMS Propagation gradient
descent algorithm with learning rate as 0.025, and we
used a batch size as 128, the number of batches =
300000 and 300-dimensional embeddings. The code
is made available on Github1, 300-dimensional emoji
embeddings learned using the emoji co-occurrence
network can also be accessed at this link.
5
5.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <sec id="sec-4-1">
        <title>Sentiment Analysis</title>
        <p>In this section, we report our accuracies obtained
for the sentiment analysis task on the gold-standard
dataset developed by Novak et al. [NSSM15]. Our
experiments have achieved accuracies which outperform
the current state-of-the-art results for sentiment
analysis on the gold-standard dataset. The gold-standard
dataset2 consists of 64599 manually labelled tweets
classi ed into positive, negative, neutral sentiment.
The dataset is divided into training set that consists
51679 tweets, 9405 out of which contain emoji and
testing set that consists of 12920 tweets, 2295 out of
1https://bit.ly/2I5hYNd
2https://bit.ly/2pLaKVZ
which contain emoji. In both training and testing sets,
29% are labelled as positive, 25% are labelled as
negative, and 46% are labelled as neural. We use the
pre-trained FastText word embeddings3 [MGB+18] to
embed words into a low dimensional vector space. We
calculate the bag of words vector for each tweet and
then use this vector as a feature to train a support
vector machine and a random forest model on the training
set, and evaluate the accuracies obtained for classi
cation task on whole testing dataset consisting of 12920
tweets. The accuracies obtained for classi cation task
using the rst order embeddings surpass the current
state-of-the-art [WBSD17b] results.
Emoji similarity4 is one of the important challenges
which should be addressed for the development of
emoji keyboards since the current emoji keyboard
consists of 2666 emojis, and the complete list cannot be
accommodated in a small screen. These emoji
embeddings learned using the emoji co-occurrence network
graph could be helpful to calculate the similarity
between emojis using cosine distance as the similarity
measure and group emojis which have high similarity
values. This grouping of emojis can decrease the
number of distinct emojis and helps us accommodate this
grouped emojis on a small screen. In this section, we
report the emoji similarity values found considering
the rst order embeddings and second order
embeddings.</p>
        <p>We consider the cosine distance to be the similarity
measure between two embeddings. Let ~a and ~b be two
vectors which represent embeddings of emojis e1 and
e2 respectively, the similarity measure between these
two emojis (e1 and e2) is calculated as
3https://bit.ly/2FMTB4N
4Our main objective is not to address the emoji similarity
task. Our main objective is to demonstrate the usefulness of
our emoji embeddings for sentiment analysis task.
similarity(e1; e2) =
(13)
The analogical reasoning task introduced by Mikolov
et al. [MSC+13], de nes the syntactic and semantic
analogies. For example, consider the semantic analogy
such as USA : Washington = India : ? where we
ll the gap (represented by \?") by nding a word
from the vocabulary whose embedding(represented by
vec(x)) is closest to vec(Washington) - vec(USA) +
vec(India). Here cosine distance is considered as the
similarity measure between the two vectors.
We extrapolate the semantic analogy task introduced
by Mikolov et al. [MSC+13] in the context of emojis,
by replacing words with emojis. Consider an emoji
analogy, ( : ) = ( : ?), we ll the gap
(represented by \?") by nding an emoji from the complete
list of emojis whose embedding(represented by vec(x))
is closest to vec( ) - vec( ) + vec( ). Table 8
reports some of the interesting analogies found using rst
order and second order embeddings.
The high accuracy for classi cation task using the rst
order embedding model is due to the fact that all
cooccurring emojis in a tweet possess the same sentiment
feature, hence during classi cation these embeddings
would increase the accuracy of the classi cation model.
Consider the tweet, \Who uses this emoji , I miss
the one that had this mouth and these eyes ! ...
where did he go?! Why did he leave?!" , in this tweet
we observe the overall sentiment to be positive, and
we also observe that all the emojis embedded in the
tweet possess the same sentiment. Hence co-occurring
emojis would be better attribute to learn emoji
embeddings which can increase the accuracy of sentiment
analysis and other related classi cation tasks.</p>
        <p>We use the Spearman s rank correlation coe cient
to evaluate the emoji similarity ranks obtained using
rst order and second order embeddings learned using
emoji co-occurrence network with the emoji
similarity ranks of gold-standard dataset5. Table 7 reports
the Spearman s correlation coe cient obtained by our
emoji embeddings. According to the correlation
coefcients the rst emoji embeddings show a strong
correlation (0:6 &lt; &lt; 0:79).</p>
        <p>The top 6 most similar emoji pairs observed
considering the rst order embeddings are reported in Table</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Future Work</title>
      <p>Usage of external knowledge has improved the
accuracies of various natural language processing tasks
and outperformed many state-of-the-art results. Jian
et al. [BGL14] have worked on leveraging external
knowledge in learning word embeddings which gave
better accuracies in word similarity and word
analogy tasks. The rst set of examples in EmoSim5087
dataset look more convincing than the results in Table
5 and Table 6; the reason being semantic knowledge
helps to us compare the similarity between di erent
emojis e ciently. Using Jian et al. s work as a
reference, we could work on incorporating external
knowledge from EmojiNet to our network embedding model
which might further improve the accuracies of
sentiment analysis and emoji similarity tasks.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgement</title>
      <p>We are grateful to Sanjaya Wijeratne and Amit Sheth
for thought-provoking discussions on the topic. We
acknowledge support from the Indian Institute of
Technology Kharagpur. Any opinions, ndings, and
conclusions/recommendations expressed in this material
are those of the author(s) and do not necessarily re ect
the views of Indian Institute of Technology Kharagpur.</p>
      <p>6The semantic similarity is the similarity measure obtained
using semantic embeddings developed by Wijeratne et al.
7https://bit.ly/2GztSR2
[ABC+16]
[BBS17]
[BCM11]</p>
      <p>Mart n Abadi, Paul Barham, Jianmin
Chen, Zhifeng Chen, Andy Davis,
Jeffrey Dean, Matthieu Devin, Sanjay
Ghemawat, Geo rey Irving, Michael Isard,
et al. Tensor ow: A system for large-scale
machine learning. In OSDI, volume 16,
pages 265{283, 2016.</p>
      <p>Francesco Barbieri, Miguel Ballesteros,
and Horacio Saggion. Are
emojis predictable? arXiv preprint
arXiv:1702.07285, 2017.</p>
      <p>Smriti Bhagat, Graham Cormode, and
S Muthukrishnan. Node classi cation in
social networks. In Social network data
analytics, pages 115{148. Springer, 2011.
[BGJM16] Piotr Bojanowski, Edouard Grave,
Armand Joulin, and Tomas Mikolov.</p>
      <p>Enriching word vectors with
subword information. arXiv preprint
arXiv:1607.04606, 2016.
[BGL14]
[BRS16]
[CLX15]
[ERA+16]
[FHSM17]</p>
      <p>Jiang Bian, Bin Gao, and Tie-Yan Liu.</p>
      <p>Knowledge-powered deep learning for
word embedding. In Joint European
Conference on Machine Learning and
Knowledge Discovery in Databases, pages 132{
148. Springer, 2014.</p>
      <p>Francesco Barbieri, Francesco Ronzano,
and Horacio Saggion. What does this
emoji mean? a vector space skip-gram
model for twitter emojis. In LREC, 2016.</p>
      <p>Shaosheng Cao, Wei Lu, and Qiongkai
Xu. Grarep: Learning graph
representations with global structural information.</p>
      <p>In Proceedings of the 24th ACM
International on Conference on Information and
Knowledge Management, pages 891{900.</p>
      <p>ACM, 2015.</p>
      <p>Ben Eisner, Tim Rocktaschel, Isabelle
Augenstein, Matko Bosnjak, and
Sebastian Riedel. emoji2vec: Learning emoji
representations from their description.
arXiv preprint arXiv:1609.08359, 2016.</p>
      <p>Halley Fede, Isaiah Herrera, SM Mahdi
Seyednezhad, and Ronaldo Menezes.</p>
      <p>Representing emoji usage using directed
networks: A twitter case study. In
International Workshop on Complex
Networks and their Applications, pages 829{
842. Springer, 2017.
[FMS+17]</p>
      <p>Bryan Perozzi, Rami Al-Rfou, and
Steven Skiena. Deepwalk: Online
learning of social representations. In
Proceedings of the 20th ACM SIGKDD
international conference on Knowledge discovery
and data mining, pages 701{710. ACM,
2014.</p>
      <p>Je rey Pennington, Richard Socher, and
Christopher Manning. Glove: Global
vectors for word representation. In
Proceedings of the 2014 conference on
empirical methods in natural language
processing (EMNLP), pages 1532{1543, 2014.
[WBSD17b] Sanjaya Wijeratne, Lakshika Balasuriya,
Amit P. Sheth, and Derek Doran. A
[YRS+14]</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Xiao</given-names>
            <surname>Yu</surname>
          </string-name>
          , Xiang Ren, Yizhou Sun, Quanquan Gu, Bradley Sturt, Urvashi Khandelwal, Brandon Norick, and Jiawei Han.
          <article-title>Personalized entity recommendation: A heterogeneous information network approach</article-title>
          .
          <source>In Proceedings of the 7th ACM international conference on Web search and data mining</source>
          , pages
          <volume>283</volume>
          {
          <fpage>292</fpage>
          . ACM,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>