<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Petra Kralj Novak, Jasmina Smailovic,
Borut Sluban, and Igor Mozetic.
Sentiment of emojis. PloS one</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Exploring Emo ji Usage and Prediction Through a Temporal Variation Lens</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Francesco Barbieri</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Large Scale Text Understanding Systems Lab</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Barcelona</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Spain ~ Snap Inc. Research</string-name>
          <email>fname.surnameg@snap.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Venice</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>California</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <volume>10</volume>
      <issue>12</issue>
      <abstract>
        <p>The frequent use of Emojis on social media platforms has created a new form of multimodal social interaction. Developing methods for the study and representation of emoji semantics helps to improve future multimodal communication systems. In this paper we explore the usage and semantics of emojis over time. We compare emoji embeddings trained on a corpus of di erent seasons and show that some emojis are used di erently depending on the time of the year. Moreover, we propose a method to take into account the time information for emoji prediction systems, outperforming state-of-the-art systems. We show that, using the time information, the accuracy of some emojis can be signi cantly improved.</p>
      </abstract>
      <kwd-group>
        <kwd>Table 1</kwd>
        <kwd>Most Frequent Emojis over di erent Seasons</kwd>
        <kwd>Spring</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Emojis are frequently used on social media (Snapchat,
Twitter, Facebook, Instagram) and on
communication platforms (Whatsapp, Messenger). In turn,
they create a new form of multimodal
communication, wherein images are used to enrich standard
text messages. Over the past few years, the
interest in emoji research has increased with several
studies which contributed to emoji semantics [BRS16,
ERA+16, WBSD17a, WBSD17b, BCC18], sentiment
Summer
Autumn</p>
      <p>Winter
analysis [NSSM15, HGS+17, KK17, RPG+18],
automatic emoji prediction [BBS17, FMS+17] and
multimodal systems [CMS15, CSG+18, BBRS18]. However,
to the best of our knowledge, the temporal dimension
of emojis has not been addressed in past research. In
this paper we explore the temporal correlation between
emoji usage and events during the year, and we show
that temporal information helps disambiguate emoji
meanings. For example, the four leaf clover emoji
is usually associated with good luck wishes, while in
March, the same emoji indicates parties and drinking,
due to St. Patrick day. In addition, some emojis are
naturally associated with speci c seasons. (e.g.,
during Christmas and in Summer), or speci c hours
(e.g., by night, and in the morning). We show
that considering temporal information helps predict
emojis, including those that are not time-speci c such
as and .
2</p>
    </sec>
    <sec id="sec-2">
      <title>Datasets</title>
      <p>We rst collected a corpus Cus of more than 100 million
English tweets, posted only in the U.S.1 from October
2015 to November 2017, and retrieved via the Twitter
API2. We then extracted two datasets out of Cus .
2.1</p>
      <sec id="sec-2-1">
        <title>Seasonal Emoji Dataset</title>
        <p>We divide our initial corpus into four subsets (tweets
posted in Spring, Summer, Autumn and Winter) to
1To remove spatial and cultural in uence on data [BKRS16].
2https://dev.twitter.com/streaming/overview
study the variation of emojis usage across di erent
seasons (Section 3). Table 1 shows the 15 most frequent
emojis of each season. We can see that while emojis
including , and are always the most common,
other emojis are select season-speci c: in Autumn,
and in Winter, and in Spring and Summer.
2.2</p>
        <p>Large Scale Emoji Prediction Dataset
We retain from Cus tweets containing only one emoji,
and only if that emoji belongs to the set of top 300
most frequently occurring emojis. The nal dataset for
emoji prediction is composed of 900,000 tweets, with
3,000 tweets per class. In previous work, we
experimentally observed that using more than 3,000 tweets
per class does not signi cantly improve the prediction
accuracy.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Does the Emoji Semantic and Usage</title>
    </sec>
    <sec id="sec-4">
      <title>Change Over Seasons?</title>
      <p>Emoji semantics are di cult to analyze due to the
subjective nature of emojis meanings, especially when it
comes to describing emotions. Nevertheless, we study
emoji semantic by association, i.e. we describe an
emoji with either a set of semantically close emojis,
or by emoji pair co-occurrence in the same tweet. To
this extent, we train skip-gram word embeddings
models [MSC+13] on the four di erent subsets (Spring,
Summer, Autumn and Winter) of our seasonal dataset.
Each model embeds emojis within a high dimensional
space (300 dimensions, 6 tokens window) where
distance metrics translate to semantic closeness and
cooccurrence. Following [BKRS16] we rst evaluate
emoji semantics by describing each emoji with its
kNearest-Neighbours (k-NN) for each season. Secondly,
for each model, we produce a correlation matrix that
encodes the semantic correlation of pairs of emojis
appearing in the same tweet. We then compare the four
matrices to see if their correlation statistics is
preserved across di erent seasons.
For each season, emojis are associated to their k-NN,
with k=10. We then look at the overlap of these NN
for each emoji among di erent models. This way, we
can investigate if a speci c emoji shares the same set
of NN across distinct seasons and thus if that emoji
preserves its meaning across seasons. We also measure
the number of NN that overlap in all the seasons to nd
emojis with smaller meaning variation during the year.
Results are shown in Table 2. In the top half of the
table, we notice that emojis related to music, animal,
sweets, and emotions are not in uenced by seasonality,
i.e. they have the same set of Nearest-Neighbours in
the four season-speci c vector models (10-NN overlap
8).</p>
      <p>The emojis for which their NN set varies the most
across seasons are the ones listed in the bottom half
of Table 2. Many are some sport-related emojis
including and probably due to seasons of the year
that are include more sports events that other
periods. Also the emoji , used in a school/university
graudation context, seems to change meaning across
seasons. The Nearest-Neighbours of these emojis are
party and heart-related emojis in Spring, while
schoolrelated emojis in Autumn.</p>
      <p>Another season-dependant example is the pine
emoji (Table 3). The pine tree is associated
with vegetation, camping and sunrise-related emoji
in Spring and Summer, while in Autumn and Winter
it co-occurs with Christmas-related emoji. The gift
emoji present a similar behaviour: in Spring and
Summer the three nearest neighbors are , and ,
while in Winter the three closest emojis are , , .
We evaluate how the semantics of pairs of emojis is
preserved across di erent seasons. To this extent, we
compute for each season a 300 300 correlation matrix,
where the correlation of emoji i and j is encoded as the
cosine similarity between their 300-D feature vectors
extracted from the season model. We then compare
pairs of seasons by evaluating the Pearson's
correlation between their respective matrices. The most
correlated matrices are Spring and Summer (0.871) while
the lowest correlation is between Spring and Winter
(0.837). However, all the matrices are highly
correlated, suggesting that only a small subset of emojis
have their semantic varying across seasons.</p>
      <p>Table 4 shows for each pair of seasons the pairs of
emojis with the highest di erence in similarity across
those seasons. The di erences between Spring and
Summer ( rst column) does not look as signi cant as
the di erences between other seasons. We can spot few
interesting cases. For example the pair is more
correlated during Autumn and Winter than during
Spring and Summer. This is due to a famous case of
doping in sport occurred during that Autumn/Winter
of 2016. The pair characterizes track related
competitions happening during Spring and Summer.
Another interesting case is the gift emoji that in
Autumn and Winter relates to a Christmas gift, as it
is highly correlated to and , while in Spring and
Summer it is mostly used as a birthday gift as it is
associated to emojis like and . The case of the
pair can relate to either mass-shooting events or
could simply suggests that in Autumn students have a
hard time with the beginning of the new school year.
One of the emojis that seems to be used di erently in
Autumn and Winter is the Skull , likely due to the
usage of this emoji during Halloween time.
4</p>
    </sec>
    <sec id="sec-5">
      <title>How Does Temporal</title>
    </sec>
    <sec id="sec-6">
      <title>Help Emoji Prediction?</title>
    </sec>
    <sec id="sec-7">
      <title>Information</title>
      <p>In this section we evaluate how temporal information
can improve the accuracy of emoji prediction models.
We use the same experimental settings as [BBS17],
except we predict 300 emojis classes in instead of 20. We
use temporal information as an input to the classi er
in addition to the tweet. The date is encoded as a
vector of three dimensions, where the rst dimension is
the month (1-12), the second dimension is the day of
the week (1-7), and the last dimension is the local hour
(1-24) when the tweet was posted. In the following we
describe our classi er architecture with two variants
to fuse temporal information with text.
4.1</p>
      <sec id="sec-7-1">
        <title>Emoji Prediction Model</title>
        <p>We start from the state-of-the-art emoji prediction
classi er [BBS17], and built two di erent methods {
early and late temproral signal fusion{ to incorporate
the date information. The two entry points for fusing
temporal information are evaluated in Section 4.2.
Inspired by [BBS17], the main architecture begins to
extract two di erent embeddings. The Char B-LSTM
takes a sequence of characters and outputs a word
embedded vector as in [LLM+15]. The Char B-LSTM
output is then concatenated with the word
representation as in [BBS17] and passed to the Word LSTM
and Word Attention units. We use the attention
mechanism introduced in [YYD+16], which can be
considered as a weighted average of the output of the Word
LSTM, where the weights are learned during training.
Finally the fully connected layers and the softmax play
the role of the nal classi er.
As previously said the date information is encoded as
a vector of three dimensions (month, week day, and
hour). For each of these dimensions we create a look
up table of vectors of size 10, and vocabulary of 12, 7,
and 24 respectively. In this way, we can learn vectors of
each month, day of the week and hour, using them for
the nal classi cation. These vectors are concatenated
all together and incorporated in two di erent ways in
the base system: at an early stage or at late stage. The
early stage consists in concatenating this date
embeddings to the word representation (char+word
embeddings) and pass them to the Word LSTM. The late
incorporation consists in concatenating the date
embeddings with the output vector of the word attention
and make the nal predictions. We only use one of the
methods to include date without combining them.
We evaluate the automatic prediction of the 300 emojis
with three di erent systems using the method:
without date information, using the early method to
incorporate date information and using the late date
method.</p>
        <p>In Table 5 we report results for the three models
using Precision, Recall, Macro F1, Accuracy at 1,3,5,10,
and Coverage error. Coverage error (CE) is de ned
as the average number of labels that have to be
included in the nal prediction such that all true labels
are predicted.</p>
        <p>From the results we can see that the best system
is the model with early incorporation of the date as
it outperforms all the other models. The late date
method is the worst system, even if it seems to create
better prediction distributions than the without date
system, since the CE is lower. In Table 6 we report the
emojis with higher gain in F1, from without date and
early date, to understand the emojis that depend most
on the time information. Among all of the emojis we
can see emojis that clearly depend on the month (St.
Patrick day, Summer) or the hour (sunrise, moon).
In the best of our knowledge, this is the rst study
to investigate if and how temporal information a ects
the interpretation and prediction of emojis. We
studied whether the semantics of emojis change over di
erent seasons, comparing emoji embeddings trained on a
corpus of di erent seasons (Spring, Summer, Autumn,
Winter) and show that some emojis are used di
erently depending on the time of the year, for example
, , and . Moreover, we proposed a method to
take in account the date information for emoji
prediction systems, slightly improving the state-of-the-art.
We show that, using the date information, the
accuracy of some emojis can be improved. Some of them
are clearly time dependent (e.g., , ). Others are
not directly associated to time but time information
helps to predict them ( , , and ).</p>
        <p>In the future we plan to study the semantics of
emojis over the day (morning/night) or over the week
(weekdays/weekend) and improve the date
information modules, trying the two methods we proposed
together (early +late).</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>Part of this work was done when Francesco B. interned
at Snap Inc. Francesco B. and Horacio S. acknowledge
support from the TUNER project
(TIN2015-65308C5-5-R, MINECO/FEDER, UE) and the Maria de
Maeztu Units of Excellence Programme
(MDM-20150502).
[BBRS18]
[CSG+18]
[ERA+16]
[FMS+17]</p>
      <sec id="sec-8-1">
        <title>Francesco Barbieri, Miguel Ballesteros, Francesco Ronzano, and Horacio Saggion. Multimodal emoji prediction. In</title>
        <p>Proceedings of NAACL: Short Papers,
New Orleans, US, 2018. Association for
Computational Linguistics.</p>
      </sec>
      <sec id="sec-8-2">
        <title>Francesco Barbieri and Jose CamachoCollados. How Gender and Skin Tone Modi ers A ect Emoji Semantics in</title>
        <p>Twitter. In Proceedings of the 7th
Joint Conference on Lexical and
Computational Semantics (*SEM 2018), New
Orleans, LA, United States, 2018.</p>
      </sec>
      <sec id="sec-8-3">
        <title>F. Barbieri, G. Kruszewski, F. Ronzano,</title>
        <p>and H. Saggion. How cosmopolitan are
emojis?: Exp. e. u. and m. over di .
lang. with dist. sem. In ACM Multimedia,
2016.</p>
      </sec>
      <sec id="sec-8-4">
        <title>F. Barbieri, F. Ronzano, and H. Saggion. What does this emoji mean? a vector space skip-gram model for t.emojis. In LREC, 2016.</title>
      </sec>
      <sec id="sec-8-5">
        <title>Spencer Cappallo, Thomas Mensink, and</title>
        <p>Cees GM Snoek. Image2emoji: Zero-shot
emoji prediction for visual media. In
Proceedings of the 23rd ACM international
conference on Multimedia, pages 1311{
1314. ACM, 2015.</p>
      </sec>
      <sec id="sec-8-6">
        <title>Spencer Cappallo, Stacey Svetlichnaya,</title>
        <p>Pierre Garrigues, Thomas Mensink, and
Cees GM Snoek. The new modality:
Emoji challenges in prediction,
anticipation, and retrieval. arXiv preprint
arXiv:1801.10253, 2018.</p>
      </sec>
      <sec id="sec-8-7">
        <title>B. Eisner, T. Rocktaschel, I. Augenstein, M.B., and S. Riedel. emoji2vec: Learning emoji representations from their description. CoRR, abs/1609.08359, 2016.</title>
        <p>B. Felbo, A. Mislove, A. S gaard, I.
Rahwan, and S. Lehmann. Using millions
[HGS+17]
[KK17]
[LLM+15]
[MSC+13]
[NSSM15]
[RPG+18]
of emoji occurrences to learn any-domain
represent. for detecting sentiment,
emotion and sarcasm. In EMNLP, 2017.</p>
      </sec>
      <sec id="sec-8-8">
        <title>Tianran Hu, Han Guo, Hao Sun, Thuy</title>
        <p>vy Thi Nguyen, and Jiebo Luo. Spice up
your chat: The intentions and sentiment
e ects of using Emoji. arXiv preprint
arXiv:1703.02860, 2017.</p>
      </sec>
      <sec id="sec-8-9">
        <title>Mayu Kimura and Marie Katsurai. Au</title>
        <p>tomatic construction of an emoji
sentiment lexicon. In Proceedings of the
2017 IEEE/ACM International
Conference on Advances in Social Networks
Analysis and Mining 2017, pages 1033{
1036. ACM, 2017.</p>
      </sec>
      <sec id="sec-8-10">
        <title>W. Ling, T. Lu s, L. Marujo, R.F. As</title>
        <p>tudillo, S. Amir, C. Dyer, A.W. Black,
and I. Trancoso. Finding function in
form: Compositional character models
for open vocabulary word representation.
EMNLP, 2015.</p>
      </sec>
      <sec id="sec-8-11">
        <title>T. Mikolov, I. Sutskever, K. Chen, G.S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013.</title>
      </sec>
      <sec id="sec-8-12">
        <title>David Rodrigues, Mar lia Prada, Rui</title>
        <p>Gaspar, Margarida V Garrido, and
Diniz Lopes. Lisbon emoji and
emoticon database (leed): norms for emoji
and emoticons in seven evaluative
dimensions. Behavior research methods,
50(1):392{405, 2018.
[WBSD17a] S. Wijeratne, L. Balasuriya, A. Sheth,
and D. Doran. A semantics-based
measure of emoji similarity. Web Intelligence,
2017.
[WBSD17b] Sanjaya Wijeratne, Lakshika Balasuriya,
Amit Sheth, and Derek Doran. Emojinet:
An open service and api for emoji sense
discovery. International AAAI
Conference on Web and Social Media (ICWSM
2017). Montreal, Canada, 2017.
[YYD+16]</p>
      </sec>
      <sec id="sec-8-13">
        <title>Z. Yang, D. Yang, C. Dyer, X. He, A.J.</title>
        <p>Smola, and E.H. Hovy. Hierarchical
attention networks for document classi
cation. In HLT-NAACL, 2016.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>