<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Skin Tone Emoji and Sentiment on Twitter</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>English Philology, University of Oulu</institution>
          ,
          <addr-line>90014 Oulu, Finland steven.coats (at) oulu.fi</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>In 2015, the Unicode Consortium introduced five skin tone emoji that can be used in combination with emoji representing human figures and body parts. In this study, use of the skin tone emoji is analyzed geographically in a large sample of data from Twitter. It can be shown that values for the skin tone emoji by country correspond approximately to the skin tone of the resident populations, and that a negative correlation exists between tweet sentiment and darker skin tone at the global level. In an era of large-scale migrations and continued sensitivity to questions of skin color and race, understanding how new language elements such as skin tone emoji are used can help frame our understanding of how people represent themselves and others in terms of a salient personal appearance attribute.</p>
      </abstract>
      <kwd-group>
        <kwd>Computer-mediated communication</kwd>
        <kwd>Corpus analysis</kwd>
        <kwd>Twitter</kwd>
        <kwd>Emoji</kwd>
        <kwd>Race/ethnicity</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1.1</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <sec id="sec-2-1">
        <title>Background</title>
        <p>
          Unicode code points are used not only to map the characters of the world’s languages,
but since 2009 also for emoji – characters that often depict faces or human forms.1
Introduced by Japanese telecommunications providers in the 1990s, emoji were
implemented in the popular iOS and Android mobile operating systems as well as on
Social Media platforms such as Facebook, Twitter, or Instagram shortly after their
canonization in the Unicode scheme. In 2015 the Unicode consortium introduced a
new set of emoji characters that include code points allowing users to select from five
different skin tones, in addition to a default skin tone (usually yellow, Fig. 1), for a set
of emoji characters that depict persons and body parts [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The skin tones, derived
from the Fitzpatrick scale used in dermatology, are applied to a face or body-part
emoji by appending the Unicode code point for the skin tone to the code point for the
face or body part.
        </p>
        <p>In this study the use of the skin tone emoji in a large global dataset of messages
collected from Twitter is investigated. After characterizing the global distribution of
skin tone emoji, a sentiment analysis is conducted. The correlation of skin tone emoji</p>
        <sec id="sec-2-1-1">
          <title>1 A list of can be found at https://emojipedia.org.</title>
          <p>
            and sentiment may reflect demographic and economic realities but can also shed light
on evolving attitudes towards skin color, race and ethnicity.
Sentiment analysis, or the automatic extraction of opinions or emotions from text
data, is an important topic in Natural Language Processing. Approaches in sentiment
analysis range from lexicon-based frequency counts (the “bag-of-words” model) to
the use of machine learning techniques based on the automatic extraction of features
in multi-dimensional vector space or the use of neural networks (for an overview, see
[
            <xref ref-type="bibr" rid="ref2">2</xref>
            ]). The approach adopted in this paper utilizes an existing emoji sentiment
classification scale [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] to annotate tweets with sentiment.
In the next section related work on emoji and skin tone emoji is described, as well as
methods for sentiment analysis relevant to the present research. In Section 3, the
collection and processing of a data set from the Twitter APIs and the tools and methods
used to undertake the analysis are introduced. In Section 4, the results of two
experiments are presented. In Section 5, the results are interpreted, a preliminary conclusion
is reached, and an outlook for further investigation of skin tone emoji is offered.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <sec id="sec-3-1">
        <title>Work on Emoticons and Emoji in Twitter</title>
        <p>
          Due to the newness of the phenomenon, analyses of skin tone emoji use are relatively
few, but some research has investigated patterns of emoji usage in general.
Emoticons, older ASCII-character sequences used to represent mainly facial expressions,
have a longer history in Computer-mediated Communication (CMC), and have been
subject to several analyses, including of their use on Twitter [
          <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7">4, 5, 6, 7</xref>
          ].
        </p>
        <p>
          For emoji, Barbieri et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] used vector space representations to compare the
meanings of emoji in Twitter corpora of American English, British English,
peninsular Spanish and Italian. They note that while the semantics of emoji across languages
and varieties are relatively stable, some emoji are used quite differently in the
corpora.
        </p>
        <p>
          McGill [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] drew attention to the underrepresentation of lighter skin-tone emoji in
the United States, and suggested that while the default yellow skin tone may be used
by some as a stand-in for lighter skin tones, people of European descent in the United
States may also be fearful of asserting their racial identity.
        </p>
        <p>
          Kralj-Novak et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] engaged annotators to rate the sentiment of Twitter messages
containing emoji in 13 languages. The derived sentiment values for individual emoji
are utilized in Section 4 to assign sentiment to the data collected for this study.
        </p>
        <p>
          Ljubešić and Fišer [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] demonstrated that Twitter users who make use of emoji
tend to be more active on the platform than non-emoji users, as well as have more
followers and friends. They note that the “Emoji modifier Fitzpatrick type-1-2”,
encoding light skin tone, is one of the most frequent emoji in their data set, comprising
2.3% of all emoji forms (85). In terms of geographic distribution, they note that
clustering nations on the basis of emoji probability distributions results in a stratification
of the skin tone emoji, with lighter skin tones among the most characteristic types in
“first- and second-world” nations and darker skin tones more characteristic for the
“fourth-world” cluster comprising mainly African nations (86–87).
2.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Twitter Sentiment Analysis</title>
        <p>
          Many sentiment analysis studies have utilized data from Twitter [
          <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
          ], and
sentiment analysis of monolingual labelled data can typically attain high rates of precision
and accuracy. Sentiment analysis of multilingual data, on the other hand, poses
various problems: For some languages there are no existing resources such as sentiment
lexicons or sentiment-labelled corpora with which supervised models could be
trained. Where multilingual sentiment analysis has been undertaken, it often targets
specific language pairs or a small number of languages. Even if sentiment-labelled
corpora exist, low levels of annotator agreement can place an upper limit on the
accuracy of models [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>
          Emoticons and emoji can be utilized in unsupervised sentiment analysis on the
basis of the fact that they are used in many languages. Tang et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], for example,
used ASCII-based emoticons in Twitter messages to create a sentiment classifier
using neural networks. Jiang et al. [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] used machine learning to create an “Emotion
Space Model” from emoji-containing data obtained from Sina Weibo (a Chinese
microblogging service similar to Twitter).
        </p>
        <p>In this study, a similar approach has been adopted. Manual annotation of the tweets
in the data was not undertaken, but rather sentiment values assigned on the basis of
aggregate use of emoji in the Kralj-Novak et al. emoji sentiment lexicon. Examination
of the labeled data suggests that the approach can offer acceptable results.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Data Collection and Processing</title>
      <p>653,457,659 tweets with “place” metadata were collected from the Twitter Streaming
API from November 2016 until June 2017 and stored at servers operated by Finland’s
Centre for Scientific Computing.2 From Unicode’s list of all emoji,3 regular
expressions were used to identify the 102 unique emoji types that can be used with skin tone
modifiers on the Twitter platform (as of late 2017).
4</p>
    </sec>
    <sec id="sec-5">
      <title>Analysis</title>
      <p>In a first analysis, the prevalence of use of the skin tone emoji was determined by
country and the median skin tone values calculated and mapped. Semantic properties
of the skin tone emoji were investigated using vector representations, and the
relationship between mean skin tone values and sentiment was considered by aggregating
tweets at the level of country or territory.
4.1</p>
      <sec id="sec-5-1">
        <title>Geographic Distribution of Skin Tone Emoji</title>
        <p>For each of the 247 country-level administrative units in the data, frequencies of the
default emoji and the skin-tone modified emoji were calculated (Table 1 summarizes
the results for the 10 countries with the most tweets).</p>
        <p>For this data, a high rate of correlation exists between “place” latitude-longitude coordinates
and “geo” latitude-longitude coordinates for tweets that contain both metadata fields. As
tweets with “place” attributes are far more numerous than tweets with “geo” attributes, they
are considered to be an accurate indication of user location.
http://unicode.org/emoji/charts/full-emoji-list.html</p>
        <p>Globally, more than 25 million tweets contained emoji that could take skin tone
values, and these tweets contained approximately 19 million skin tone emoji. 54.3%
of tweets with at least one potential skin tone emoji had an emoji with an assigned
skin tone value, and 50.1% of potential skin tone emoji had skin tone values. Users in
the United States, the country of origin of Twitter, of the Unicode standard, and of the
skin-tone emoji, are more likely to use the skin tone modifiers. Anglophone countries
such as Britain and the Philippines also use relatively many skin tone emoji. The
proportion of skin-tone-possible emoji that were assigned skin tone according to country
is shown in Fig. 2.</p>
        <p>Fig. 2. Proportion of emoji with skin tone values. Green shades indicate a higher
proportion of skin-tone emoji; red shades lower.
4.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Median Skin Tone Values</title>
        <p>
          The global distribution of skin tone values was as follows: light, 36%; medium-light
25%; medium 20%, medium-dark, 16%, dark, 3%. To some extent, the median skin
tone value by country/territory (Fig. 3) corresponds with levels of yearly insolation,
which in turn affects the average level of skin pigmentation in ancestral human
population groups (Fig. 4). Lighter-than-expected skin tone values in Asian countries may
reflect cultural values associating lighter skin with health and beauty. The higher
value for Afghanistan may be due to the presence of U.S. military personnel in the
country. Darker-than-expected skin tone values in the United States may reflect the
disproportional popularity of Twitter among African-Americans (see [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]). For Europe,
darker median skin tone values may reflect enthusiasm for African or
AfricanAmerican popular culture or migration.
        </p>
      </sec>
      <sec id="sec-5-3">
        <title>Emoji Skin Tone and Sentiment</title>
        <p>Two methods were used to investigate the sentiment of the corpus. In a first
experiment, word embeddings in multidimensional vector space were created to identify
lexical items close to the skin tone emoji in meaning. In a second experiment,
sentiment per tweet was calculated by utilizing the Kralj-Novak emoji sentiment
classification lexicon.</p>
      </sec>
      <sec id="sec-5-4">
        <title>Word Embeddings in Multidimensional Space</title>
        <p>
          Recent work in many types of Natural Language Processing has seen widespread use
of word embeddings for tasks ranging from translation to content extraction,
part-ofspeech tagging, parsing, or sentiment analysis. The basic principle underlying these
approaches was alluded to by Firth’s dictum that one shall “know a word by the
company it keeps” [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. First formally proposed by Harris [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] and sometimes referred to
as the “Distributional Hypothesis”, it refers to the fact that linguistic elements that
show similar collocational and syntactic distributions often exhibit similar semantics;
measures such as pointwise mutual information can be incorporated into models that
quantify the collocational properties of words or n-grams. In word embedding models,
the words in a document or set of documents can be transformed into vectors based on
the probability of their co-occurrence within a specified span.
        </p>
        <p>
          The 25,297,245 tweets that contained emoji that could potentially take on skin-tone
values were used to create word embeddings: All unique tokens in these tweets were
assigned values in a 400-dimensional vector space based on an continuous
bag-ofwords embedding window of five tokens to the right and left and a minimum of 10
token occurrences in the corpus, using an implementation of the Word2Vec algorithm
[
          <xref ref-type="bibr" rid="ref20 ref21">20, 21</xref>
          ]. A preliminary insight into the differences in meaning the skin tone emoji can
entail is provided by examining the tokens closest to the skin tone code points in the
resulting vector space (Fig. 5).
        </p>
        <p>The cosine distance indicates the similarity of the vectors for the token pair and can
range in value from -1 (opposite vectors, semantics very different) to 1 (identical
vectors, semantics very similar). Among the most similar tokens for all five types are
the other skin tone emoji. This suggests that the semantic value of the skin tone emoji
is well represented by the vector space model, and is to be expected based on the fact
that these characters occur in the same contexts (as code points following a limited set
of face- or body-part code points).</p>
        <p>The types most similar to each of the skin tones in terms of cosine distance give
some insight into the contexts of use of each of the skin tone emoji and hence their
meanings. Light skin tone values are associated with emoji that express affection,
satisfaction, or happiness. Medium-light skin tones are associated with mainly
positive emoji expressing affection or irreverence. Medium skin tone emoji are closest to
emoji with negative affective connotations such as crying and shouting faces, the
twoeyes emoji (possibly used as an expression of incredulity), and a skull emoji, as well
as a character with the numeral 100 and the positive “face with tears of joy” emoji.4
Medium-dark skin tones are additionally associated with the informal
Englishlanguage words lol, tho and bruh. The dark skin tone is closest to 100, the two eyes,
the skull, and emoji representing fire, prayer, and speaking.
4</p>
        <p>The 100 emoji was originally used in Japanese mobile communications to indicate a
teacher’s mark of 100 points for a school assignment, but in American usage is likely related to
keep it 100, a phrase meaning to keep it real, or “be honest/authentic”.</p>
      </sec>
      <sec id="sec-5-5">
        <title>Labelling sentiment of Tweets</title>
        <p>
          Kralj-Novak et al. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] provide frequency information for the annotation of tweets
containing emoji in 13 languages as “negative”, “neutral”, or “positive” by manual
annotators.5 Examples of emoji with positive and negative sentiment values are
shown in Table 2: Faces expressing, for example, indifference or anger have negative
values, while faces expressing affection, flowers, or gifts have positive values.6 From
this lexicon, scores for individual emoji were calculated by subtracting the number of
negatively-evaluated sentences containing a particular emoji from the number of
positively evaluated sentences with the same type and dividing by the total number of
occurrences of that emoji.
Emoji with at least 50 occurrences in the Kralj-Novak data were used to evaluate the
sentiment of the 653.5 million tweets in the experimental data. Tweets were cleaned
of usernames, hashtags and urls, converted to lower case, and tokenized using the
NLTK Twitter Tokenizer [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], the Jieba tokenizer for Mandarin [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ] and the Tiny
Segmenter for Japanese [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ], then scored using the Kralj-Novak sentiment scale.
Examples are shown in Fig. 6.7
        </p>
        <sec id="sec-5-5-1">
          <title>5 http://kt.ijs.si/data/Emoji_sentiment_ranking/</title>
          <p>
            6 The “Japanese Dolls” emoji represents figures used in a traditional Japanese observance.
7 Negative emoji are less frequently used in the data. Negative sentiment, in general, is less
frequently expressed [
            <xref ref-type="bibr" rid="ref25">25</xref>
            ].
Fig. 6. Four tweets in which country of origin, automatically detected language, original text,
calculated emoji-sentiment value, and tokens after cleaning are shown.
          </p>
        </sec>
      </sec>
      <sec id="sec-5-6">
        <title>Correlation of Skin Tone and Sentiment</title>
        <p>Mean skin tone per tweet was calculated by assigning the values 1 to 5 to the skin
tone emoji, then for each tweet, dividing the sum of the skin tone emoji values by the
number of skin tone emoji.8 Mean skin tone and sentiment were correlated in all
tweets containing at least one skin tone emoji for the entire data and at
country/territorial level by using Pearson’s product-moment correlation. For the entire
data, a weak negative correlation between sentiment and skin tone values was found
(r = -0.09, df = 13,736,953, p &lt; 10-32).</p>
        <p>At the level of country/territory, the correlation between mean sentiment and mean
skin tone value was more strongly negative, at r = -0.25 (df = 235, p = 0.000076) for
the 237 countries or territories with at least one skin tone emoji (Fig. 7).
8</p>
        <p>The values assigned were: light skin tone = 1, medium-light skin tone = 2, medium skin tone
= 3, medium-dark skin tone = 4, dark skin tone = 5.
Fig. 7. Correlation between Mean Tweet Sentiment and Mean Tweet Skin Tone for 237
Countries/Territories (Shaded Area = 95% Confidence Interval)</p>
        <p>To mitigate the effects small sample size (e.g. for countries in which only one or a
few users contributed most or all of the skin-tone emoji), the model was refitted for
the 50 countries/territories with the highest number of tweets (Fig. 8), with the result
that the negative relationship between mean tweet sentiment and mean tweet skin tone
strengthened to r = -0.28 (df = 48, p = 0.051).
Fig. 8. Correlation between Mean Tweet Sentiment and Mean Tweet Skin Tone for 50
Countries with the Largest Number of Tweets (Shaded Area = 95% Confidence Interval)
The countries with lighter mean skin tone values, such as Egypt, Paraguay, Qatar,
Indonesia, or the UAE, have higher mean sentiment scores. European countries, with
mean skin tone values ranging from approximately 1.7 to 2.25, have middling
sentiment values that fall within the 95% confidence interval. The countries with high
mean skin tone values, such as the United States, Nigeria, or Kenya, have lower mean
sentiment values.
5</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Summary and Discussion</title>
      <p>
        Since their introduction into the Unicode scheme in 2015, skin tone emoji have
become a widely used resource on the Twitter platform. Their global distribution,
semantic properties, and patterning with tweet sentiment were investigated in a large
corpus of tweets containing geographical metadata by using word embeddings and an
emoji sentiment lexicon [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Some caveats apply when using the Kralj-Novak et al.
lexicon: The number of emoji in the Unicode Standard has increased considerably
since creation of the resource. So, for example, an emoji such as (U+1F595
REVERSED HAND WITH MIDDLE FINGER EXTENDED), introduced with Emoji
7.0 in 2014, does not have a value in the classification scheme, although it is likely to
be used to mark negative affect. Likewise, emoji introduced since 2015, such as
sequences consisting of two or more code points joined together (used to represent e.g.
flags or groups of persons), are not in the lexicon, and nor are the skin tone emoji
themselves. Nonetheless, the broad coverage of the lexicon, in which most of the
emoji which can be paired with skin tone emoji are assigned a sentiment value, makes
sentiment inference of tweets containing skin tone emoji feasible. The geographical
distribution of skin tone emoji, their semantics, and the correlation of skin tone and
sentiment suggest several preliminary interpretations.
      </p>
      <p>
        The global distribution of skin tone emoji shows that to a certain extent, the
characters are being used on Twitter as intended by the originators of the Unicode
proposal: to make it possible for people to represent their own skin color in online
communication. While darker skin tone emoji are used in Africa and the United States,
lighter skin tones are more numerous globally and are more likely to be used in Asia,
the Middle East, and parts of Latin America and Europe. For the U.S., the prevalence
of darker skin tone emoji may in part be explained by the popularity of Twitter among
African-Americans, who are overrepresented on the platform compared to their share
of the population [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. For Europe, darker emoji skin tones may indicate a youthful
Twitter user population: in general, younger users are more likely to utilize
nonstandard linguistic resources such as emoji on CMC [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], and in Europe younger
people (presumably including some Twitter users) are more likely to come from
immigrant backgrounds. In Asia, the Middle East, and parts of Latin America, lighter
skin tone emoji may reflect cultural norms concerning the physical attributes of health
and beauty in which skin color can play an important role [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], a fact that has been
documented in research into body satisfaction [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], attractiveness ratings [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], or use
of skin whitening products, particularly by females [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ].
      </p>
      <p>
        The semantics of skin tone emoji are, in part, manifest in a multidimensional
vector space model. Lighter skin tone emoji are more similar in their collocational
properties (and hence semantics) to other emoji that can be interpreted as expressing
generally positive affect, such as smiling faces and heart symbols, while darker skin tone
emoji are more closely associated with symbols that express other affective states,
including distress, as well as non-standard word forms. The finding corresponds to
that of Ljubešić and Fišer [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], who do not investigate sentiment or emoji skin tone
directly, but cluster countries based on their emoji distributions. They note that the
two darkest skin-tone values, as well as several emoji depicting unhappy faces,
belong to a “fourth world” cluster of mainly African countries.
      </p>
      <p>
        The association between darker emoji skin tone and (possible) negative affect is
also manifest when sentiment and mean emoji skin tone and sentiment are regressed.
The negative relationship between sentiment and skin tone is weak when all tweets
with skin tone emoji are considered, having a value of r = -0.09. Because there are so
many more skin tone emoji tweets in the data from the United States than from other
countries, the value is almost the same as that for the United States data alone. When
considered at country- or territorial level, however, the value is more strongly
negative at r = -0.25, increasing in strength to r = -0.28 when only the 50 countries with
the most tweets are considered. The negative association between sentiment and
emoji skin tone in this data is in accord with the implicit findings of the Ljubešić and
Fišer study, and parallels the results of survey-based measures of happiness by
country, in which many countries of the developing world, primarily in Africa, report low
levels of well-being and happiness [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ].
      </p>
      <p>
        The large amount of data collected in this study makes more specific country-level
analyses of skin tone emoji use possible. Considering the fact that language- and
geography-based differences in emoji usage and meaning have been found [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], the
semantics of skin tone emoji in particular languages, countries, or geographical contexts
could be more closely examined using vector spaces. Other future work could include
updates and refinements to the emoji sentiment lexicon, as well as the utilization of
more sophisticated sentiment models based on machine learning, support vector
machines, or neural networks. Parsed data containing skin tone emoji could be analyzed
to consider evaluative use of skin tone emoji.
      </p>
      <p>As skin tone emoji continue to gain in popularity worldwide, techniques for
measuring and evaluating the ways in which they are used are likely to play a role in NLP
tasks pertaining to information extraction in bi- and multilingual contexts. In a
broader perspective, the analysis of skin tone emoji use can give insight into on how
humans represent themselves on social media, what kinds of attitudes and meanings are
associated with skin color, and how language is used to depict the phenotypical
diversity of the shared human condition.</p>
      <sec id="sec-6-1">
        <title>Acknowledgement</title>
        <p>6
The author thanks Finland’s Centre for Scientific Computing (CSC) for providing
access to computational and data storage facilities.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Edberg</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <source>Unicode emoji (Unicode Technical Standard #51)</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Dashtipour</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poria</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hussain</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cambria</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hawalah</surname>
            ,
            <given-names>A. Y. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gelbukh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          :
          <article-title>Multilingual sentiment analysis: State of the art and independent comparison of techniques</article-title>
          .
          <source>Cognitive Computation (8)</source>
          ,
          <fpage>757</fpage>
          -
          <lpage>771</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kralj-Novak</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smailović</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sluban</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Mozetič</surname>
          </string-name>
          , I.:
          <article-title>Sentiment of emojis</article-title>
          .
          <source>PLoS ONE</source>
          <volume>10</volume>
          (
          <issue>12</issue>
          ) (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bamman</surname>
            ,
            <given-names>D</given-names>
          </string-name>
          , Eisenstein,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Schnoebelen</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>Gender identity and lexical variation in social media</article-title>
          .
          <source>Journal of Sociolinguistics (18)</source>
          ,
          <fpage>135</fpage>
          -
          <lpage>160</lpage>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Schnoebelen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Do you smile with your nose? Stylistic variation in Twitter emoticons</article-title>
          .
          <source>University of Pennsylvania Working Papers in Linguistics</source>
          <volume>18</volume>
          (
          <issue>2</issue>
          ),
          <fpage>115</fpage>
          -
          <lpage>125</lpage>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Coats</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Grammatical feature frequencies of English on Twitter in Finland</article-title>
          . In: Squires,
          <string-name>
            <surname>L</surname>
          </string-name>
          . (ed.),
          <article-title>English in computer-mediated communication: Variation, representation</article-title>
          , and change, pp.
          <fpage>179</fpage>
          -
          <lpage>210</lpage>
          . De Gruyter Mouton, Berlin (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Pavalanathan</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eisenstein</surname>
          </string-name>
          , J.:
          <article-title>Emoticons vs. emojis on Twitter: A casual inference approach</article-title>
          .
          <source>arXiv: 1510.080480v1 [cs.CL]</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Barbieri</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kruszewski</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ronzano</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saggion</surname>
          </string-name>
          , H.:
          <article-title>How cosmopolitan are emojis? Exploring emojis usage and meaning over different languages with distributional semantics</article-title>
          .
          <source>MM '16 Proceedings of the 2016 ACM on Multimedia Conference, Amsterdam, the Netherlands - October 15-19</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>531</fpage>
          -
          <lpage>535</lpage>
          . ACM, New York (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>McGill</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Why White people don't use white emoji. The Atlantic</article-title>
          . https://www.theatlantic.com/politics/archive/2016/05/white-people
          <string-name>
            <surname>-</surname>
          </string-name>
          dont
          <string-name>
            <surname>-</surname>
          </string-name>
          use-whiteemoji/481695/ (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Ljubešić</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fišer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>A Global Analysis of Emoji Usage</article-title>
          .
          <source>In: Proceedings of the 10th Web as Corpus Workshop</source>
          (
          <article-title>WAC-X) and the EmpiriST Shared Task</article-title>
          , pp.
          <fpage>82</fpage>
          -
          <lpage>89</lpage>
          , Berlin, Germany,
          <source>August</source>
          <volume>7</volume>
          -
          <issue>12</issue>
          ,
          <year>2016</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          , Stroudsburg, PA (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Pak</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paroubek</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Twitter as a corpus for sentiment analysis and opinion mining</article-title>
          . In: Calzolari,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Choukri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Maegaard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Mariani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Odijk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Piperidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Rosner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Tapias</surname>
          </string-name>
          .,
          <string-name>
            <surname>D</surname>
          </string-name>
          . (eds.),
          <source>Proceedings of LREC</source>
          , pp.
          <fpage>1320</fpage>
          -
          <lpage>1326</lpage>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Kolchyna</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Souza</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Treleaven</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aste</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Twitter sentiment analysis: Lexicon method, machine learning method and their combination</article-title>
          .
          <source>arXiv: 1507.00995v3 [cs.CL]</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Mozetič</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grčar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smailović</surname>
          </string-name>
          , J.:
          <article-title>Multilingual Twitter sentiment classification: The role of human annotators</article-title>
          .
          <source>PLoS ONE11(5)</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qin</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Building a large-scale Twitter-specific sentiment lexicon: A representation learning approach</article-title>
          .
          <source>In: COLING</source>
          <year>2014</year>
          , pp.
          <fpage>172</fpage>
          -
          <lpage>182</lpage>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.-Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luan</surname>
          </string-name>
          , H.-B.,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>J.-S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ma</surname>
          </string-name>
          , S.-P.:
          <article-title>Microblog sentiment analysis with emotion space model</article-title>
          .
          <source>Journal of Computer Science and Technology</source>
          <volume>30</volume>
          (
          <issue>5</issue>
          ),
          <fpage>1120</fpage>
          -
          <lpage>1129</lpage>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Duggan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Mobile messaging and social media 2015</article-title>
          . Report, Pew Research Center (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Chaplin</surname>
          </string-name>
          , G.:
          <article-title>Geographic distribution of environmental factors influencing human skin coloration</article-title>
          .
          <source>American Journal of Physical Anthropology</source>
          <volume>125</volume>
          ,
          <fpage>292</fpage>
          -
          <lpage>302</lpage>
          (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Firth</surname>
            ,
            <given-names>J. R</given-names>
          </string-name>
          . Papers in linguistics, 1934-
          <fpage>1951</fpage>
          . Oxford University Press, London (
          <year>1957</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Harris</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <article-title>Mathematical structures of language</article-title>
          . Interscience, New York (
          <year>1968</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yih</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zweig</surname>
          </string-name>
          , G.:
          <article-title>Linguistic regularities in continuous space word representations</article-title>
          .
          <source>In: Proceedings of HLT-NAACL 13</source>
          , pp.
          <fpage>746</fpage>
          -
          <lpage>751</lpage>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Řehůřek</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sojka</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Software framework for topic modelling with large corpora</article-title>
          .
          <source>In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks</source>
          , pp.
          <fpage>45</fpage>
          -
          <lpage>50</lpage>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Bird</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loper</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
          </string-name>
          , E.:
          <article-title>Natural language processing with Python. O'Reilly Media Inc</article-title>
          ., Newton, MA (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Sun</surname>
          </string-name>
          , J.: Jieba:
          <article-title>Chinese word segmentation module</article-title>
          . https://github.com/fxsjy/jieba (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Hagiwara</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Tinysegmenter:
          <article-title>Tokenizer specified for Japanese</article-title>
          . https://github.com/SamuraiT/tinysegmenter (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Dodds</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Desu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reagan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Human language reveals a universal positivity bias</article-title>
          .
          <source>PNAS</source>
          <volume>112</volume>
          (
          <issue>8</issue>
          ),
          <fpage>2389</fpage>
          -
          <lpage>2394</lpage>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Pavalanathan</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eisenstein</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <article-title>Confounds and consequences in geotagged Twitter data</article-title>
          .
          <source>arXiv:1506.02275v2 [cs.CL]</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Min</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belk</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kimura</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bahl</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Skin lightening and beauty in four Asian cultures</article-title>
          . In: Lee,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Soman</surname>
          </string-name>
          ,
          <string-name>
            <surname>D</surname>
          </string-name>
          . (eds.),
          <source>Advances in Consumer Research</source>
          Volume
          <volume>35</volume>
          , pp.
          <fpage>444</fpage>
          -
          <lpage>449</lpage>
          . Association for Consumer Research, Duluth, MN (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Sahay</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piran</surname>
          </string-name>
          , N.:
          <article-title>Skin-color preferences and body satisfaction among South AsianCanadian and European-Canadian female university students</article-title>
          .
          <source>Journal of Social Psychology</source>
          <volume>137</volume>
          (
          <issue>2</issue>
          ),
          <fpage>161</fpage>
          -
          <lpage>171</lpage>
          (
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Swami</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Furnham</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>The influence of skin tone, hair length, and hair colour on ratings of women's physical attractiveness, health and fertility</article-title>
          .
          <source>Scandinavian Journal of Psychology</source>
          <volume>49</volume>
          ,
          <fpage>429</fpage>
          -
          <lpage>437</lpage>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Peltzer</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pengpid</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>James</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>The globalization of whitening: prevalence of skin lighteners (or bleachers) use and its social correlates among university students in 26 countries</article-title>
          .
          <source>International Journal of Dermatology</source>
          <volume>55</volume>
          (
          <issue>2</issue>
          ),
          <fpage>165</fpage>
          -
          <lpage>172</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Helliwell</surname>
            ,
            <given-names>J. F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>The social foundations of world happiness</article-title>
          . In: Helliwell,
          <string-name>
            <given-names>J. F.</given-names>
            ,
            <surname>Layard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Sachs</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . (eds.).
          <source>Word Happiness Report</source>
          <year>2017</year>
          . Columbia University Center for Sustainable Development, New York (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>