=Paper= {{Paper |id=Vol-2084/paper10 |storemode=property |title=Skin Tone Emoji and Sentiment on Twitter |pdfUrl=https://ceur-ws.org/Vol-2084/paper10.pdf |volume=Vol-2084 |authors=Steven Coats |dblpUrl=https://dblp.org/rec/conf/dhn/Coats18 }} ==Skin Tone Emoji and Sentiment on Twitter== https://ceur-ws.org/Vol-2084/paper10.pdf
              Skin Tone Emoji and Sentiment on Twitter
                                   Steven Coats1[0000-0002-7295-3893]
                   1
                       English Philology, University of Oulu, 90014 Oulu, Finland
                                 steven.coats (at) oulu.fi



         Abstract. In 2015, the Unicode Consortium introduced five skin tone emoji that
         can be used in combination with emoji representing human figures and body
         parts. In this study, use of the skin tone emoji is analyzed geographically in a
         large sample of data from Twitter. It can be shown that values for the skin tone
         emoji by country correspond approximately to the skin tone of the resident
         populations, and that a negative correlation exists between tweet sentiment and
         darker skin tone at the global level. In an era of large-scale migrations and con-
         tinued sensitivity to questions of skin color and race, understanding how new
         language elements such as skin tone emoji are used can help frame our under-
         standing of how people represent themselves and others in terms of a salient
         personal appearance attribute.


         Keywords: Computer-mediated communication, Corpus analysis, Twitter,
         Emoji, Race/ethnicity


1        Introduction

1.1      Background
Unicode code points are used not only to map the characters of the world’s languages,
but since 2009 also for emoji – characters that often depict faces or human forms.1
Introduced by Japanese telecommunications providers in the 1990s, emoji were im-
plemented in the popular iOS and Android mobile operating systems as well as on
Social Media platforms such as Facebook, Twitter, or Instagram shortly after their
canonization in the Unicode scheme. In 2015 the Unicode consortium introduced a
new set of emoji characters that include code points allowing users to select from five
different skin tones, in addition to a default skin tone (usually yellow, Fig. 1), for a set
of emoji characters that depict persons and body parts [1]. The skin tones, derived
from the Fitzpatrick scale used in dermatology, are applied to a face or body-part
emoji by appending the Unicode code point for the skin tone to the code point for the
face or body part.

   In this study the use of the skin tone emoji in a large global dataset of messages
collected from Twitter is investigated. After characterizing the global distribution of
skin tone emoji, a sentiment analysis is conducted. The correlation of skin tone emoji

1
    A list of can be found at https://emojipedia.org.
2


and sentiment may reflect demographic and economic realities but can also shed light
on evolving attitudes towards skin color, race and ethnicity.




      Fig. 1. Twitter emoji in default yellow and in the five Fitzpatrick scale-derived values

   Sentiment analysis, or the automatic extraction of opinions or emotions from text
data, is an important topic in Natural Language Processing. Approaches in sentiment
analysis range from lexicon-based frequency counts (the “bag-of-words” model) to
the use of machine learning techniques based on the automatic extraction of features
in multi-dimensional vector space or the use of neural networks (for an overview, see
[2]). The approach adopted in this paper utilizes an existing emoji sentiment classifi-
cation scale [3] to annotate tweets with sentiment.


1.2     Organization of the Text

In the next section related work on emoji and skin tone emoji is described, as well as
methods for sentiment analysis relevant to the present research. In Section 3, the col-
lection and processing of a data set from the Twitter APIs and the tools and methods
used to undertake the analysis are introduced. In Section 4, the results of two experi-
ments are presented. In Section 5, the results are interpreted, a preliminary conclusion
is reached, and an outlook for further investigation of skin tone emoji is offered.


2       Related Work

2.1     Work on Emoticons and Emoji in Twitter
Due to the newness of the phenomenon, analyses of skin tone emoji use are relatively
few, but some research has investigated patterns of emoji usage in general. Emoti-
cons, older ASCII-character sequences used to represent mainly facial expressions,
have a longer history in Computer-mediated Communication (CMC), and have been
subject to several analyses, including of their use on Twitter [4, 5, 6, 7].

    For emoji, Barbieri et al. [8] used vector space representations to compare the
meanings of emoji in Twitter corpora of American English, British English, peninsu-
lar Spanish and Italian. They note that while the semantics of emoji across languages
and varieties are relatively stable, some emoji are used quite differently in the corpo-
ra.
                                                                                        3


   McGill [9] drew attention to the underrepresentation of lighter skin-tone emoji in
the United States, and suggested that while the default yellow skin tone may be used
by some as a stand-in for lighter skin tones, people of European descent in the United
States may also be fearful of asserting their racial identity.

   Kralj-Novak et al. [3] engaged annotators to rate the sentiment of Twitter messages
containing emoji in 13 languages. The derived sentiment values for individual emoji
are utilized in Section 4 to assign sentiment to the data collected for this study.

    Ljubešić and Fišer [10] demonstrated that Twitter users who make use of emoji
tend to be more active on the platform than non-emoji users, as well as have more
followers and friends. They note that the “Emoji modifier Fitzpatrick type-1-2”, en-
coding light skin tone, is one of the most frequent emoji in their data set, comprising
2.3% of all emoji forms (85). In terms of geographic distribution, they note that clus-
tering nations on the basis of emoji probability distributions results in a stratification
of the skin tone emoji, with lighter skin tones among the most characteristic types in
“first- and second-world” nations and darker skin tones more characteristic for the
“fourth-world” cluster comprising mainly African nations (86–87).


2.2    Twitter Sentiment Analysis
Many sentiment analysis studies have utilized data from Twitter [11, 12], and senti-
ment analysis of monolingual labelled data can typically attain high rates of precision
and accuracy. Sentiment analysis of multilingual data, on the other hand, poses vari-
ous problems: For some languages there are no existing resources such as sentiment
lexicons or sentiment-labelled corpora with which supervised models could be
trained. Where multilingual sentiment analysis has been undertaken, it often targets
specific language pairs or a small number of languages. Even if sentiment-labelled
corpora exist, low levels of annotator agreement can place an upper limit on the accu-
racy of models [13].

   Emoticons and emoji can be utilized in unsupervised sentiment analysis on the ba-
sis of the fact that they are used in many languages. Tang et al. [14], for example,
used ASCII-based emoticons in Twitter messages to create a sentiment classifier us-
ing neural networks. Jiang et al. [15] used machine learning to create an “Emotion
Space Model” from emoji-containing data obtained from Sina Weibo (a Chinese mi-
croblogging service similar to Twitter).

   In this study, a similar approach has been adopted. Manual annotation of the tweets
in the data was not undertaken, but rather sentiment values assigned on the basis of
aggregate use of emoji in the Kralj-Novak et al. emoji sentiment lexicon. Examination
of the labeled data suggests that the approach can offer acceptable results.
4


3         Data Collection and Processing

653,457,659 tweets with “place” metadata were collected from the Twitter Streaming
API from November 2016 until June 2017 and stored at servers operated by Finland’s
Centre for Scientific Computing.2 From Unicode’s list of all emoji,3 regular expres-
sions were used to identify the 102 unique emoji types that can be used with skin tone
modifiers on the Twitter platform (as of late 2017).


4         Analysis

In a first analysis, the prevalence of use of the skin tone emoji was determined by
country and the median skin tone values calculated and mapped. Semantic properties
of the skin tone emoji were investigated using vector representations, and the relation-
ship between mean skin tone values and sentiment was considered by aggregating
tweets at the level of country or territory.

4.1       Geographic Distribution of Skin Tone Emoji


For each of the 247 country-level administrative units in the data, frequencies of the
default emoji and the skin-tone modified emoji were calculated (Table 1 summarizes
the results for the 10 countries with the most tweets).

    Table 1. Counts of tweets, potential skin tone emoji, and skin tone emoji for the 10 countries
                                        with the most tweets.

             Country          Tweets      Pot. skintone Skin tone Proportion
             USA              201,361,543 11,153,159 8,605,451 0.77
             Brazil           92,987,119      3,763,962        1,497,697 0.40
             Japan            38,598,876      2,228,136        445,699 0.20
             Great Britain 35,837,868         2,497,332        1,387,807 0.56
             Philippines 20,808,246           1,269,572        581,922 0.46
             Argentina        20,023,675      1,836,721        290,912      0.16
             Turkey           19,218,332      1,005,601        356,020      0.35
             Spain            15,822,869      1,086,395        361,905      0.33
             Malaysia         14,787,098      811,196          432,322      0.53
             France           13,825,568      763,565          371,576      0.49

2
     For this data, a high rate of correlation exists between “place” latitude-longitude coordinates
     and “geo” latitude-longitude coordinates for tweets that contain both metadata fields. As
     tweets with “place” attributes are far more numerous than tweets with “geo” attributes, they
     are considered to be an accurate indication of user location.
3
     http://unicode.org/emoji/charts/full-emoji-list.html
                                                                                          5




    Globally, more than 25 million tweets contained emoji that could take skin tone
values, and these tweets contained approximately 19 million skin tone emoji. 54.3%
of tweets with at least one potential skin tone emoji had an emoji with an assigned
skin tone value, and 50.1% of potential skin tone emoji had skin tone values. Users in
the United States, the country of origin of Twitter, of the Unicode standard, and of the
skin-tone emoji, are more likely to use the skin tone modifiers. Anglophone countries
such as Britain and the Philippines also use relatively many skin tone emoji. The pro-
portion of skin-tone-possible emoji that were assigned skin tone according to country
is shown in Fig. 2.




      Fig. 2. Proportion of emoji with skin tone values. Green shades indicate a higher
                     proportion of skin-tone emoji; red shades lower.



4.2     Median Skin Tone Values
The global distribution of skin tone values was as follows: light, 36%; medium-light
25%; medium 20%, medium-dark, 16%, dark, 3%. To some extent, the median skin
tone value by country/territory (Fig. 3) corresponds with levels of yearly insolation,
which in turn affects the average level of skin pigmentation in ancestral human popu-
lation groups (Fig. 4). Lighter-than-expected skin tone values in Asian countries may
6


reflect cultural values associating lighter skin with health and beauty. The higher val-
ue for Afghanistan may be due to the presence of U.S. military personnel in the coun-
try. Darker-than-expected skin tone values in the United States may reflect the dis-
proportional popularity of Twitter among African-Americans (see [16]). For Europe,
darker median skin tone values may reflect enthusiasm for African or African-
American popular culture or migration.




                    Fig. 3. Median value of skin tone emoji by country




                  Fig. 4. Skin color of ancestral populations (source: [17])
                                                                                     7


4.3    Emoji Skin Tone and Sentiment
Two methods were used to investigate the sentiment of the corpus. In a first experi-
ment, word embeddings in multidimensional vector space were created to identify
lexical items close to the skin tone emoji in meaning. In a second experiment, senti-
ment per tweet was calculated by utilizing the Kralj-Novak emoji sentiment classifi-
cation lexicon.


Word Embeddings in Multidimensional Space

Recent work in many types of Natural Language Processing has seen widespread use
of word embeddings for tasks ranging from translation to content extraction, part-of-
speech tagging, parsing, or sentiment analysis. The basic principle underlying these
approaches was alluded to by Firth’s dictum that one shall “know a word by the com-
pany it keeps” [18]. First formally proposed by Harris [19] and sometimes referred to
as the “Distributional Hypothesis”, it refers to the fact that linguistic elements that
show similar collocational and syntactic distributions often exhibit similar semantics;
measures such as pointwise mutual information can be incorporated into models that
quantify the collocational properties of words or n-grams. In word embedding models,
the words in a document or set of documents can be transformed into vectors based on
the probability of their co-occurrence within a specified span.

   The 25,297,245 tweets that contained emoji that could potentially take on skin-tone
values were used to create word embeddings: All unique tokens in these tweets were
assigned values in a 400-dimensional vector space based on an continuous bag-of-
words embedding window of five tokens to the right and left and a minimum of 10
token occurrences in the corpus, using an implementation of the Word2Vec algorithm
[20, 21]. A preliminary insight into the differences in meaning the skin tone emoji can
entail is provided by examining the tokens closest to the skin tone code points in the
resulting vector space (Fig. 5).
8




             Fig. 5. Ten closest tokens in vector space for the five skin-tone emoji


   The cosine distance indicates the similarity of the vectors for the token pair and can
range in value from -1 (opposite vectors, semantics very different) to 1 (identical
vectors, semantics very similar). Among the most similar tokens for all five types are
the other skin tone emoji. This suggests that the semantic value of the skin tone emoji
is well represented by the vector space model, and is to be expected based on the fact
that these characters occur in the same contexts (as code points following a limited set
of face- or body-part code points).

   The types most similar to each of the skin tones in terms of cosine distance give
some insight into the contexts of use of each of the skin tone emoji and hence their
meanings. Light skin tone values are associated with emoji that express affection,
satisfaction, or happiness. Medium-light skin tones are associated with mainly posi-
tive emoji expressing affection or irreverence. Medium skin tone emoji are closest to
emoji with negative affective connotations such as crying and shouting faces, the two-
eyes emoji (possibly used as an expression of incredulity), and a skull emoji, as well
as a character with the numeral 100 and the positive “face with tears of joy” emoji.4
Medium-dark skin tones are additionally associated with the informal English-
language words lol, tho and bruh. The dark skin tone is closest to 100, the two eyes,
the skull, and emoji representing fire, prayer, and speaking.




4
    The 100 emoji was originally used in Japanese mobile communications to indicate a teach-
    er’s mark of 100 points for a school assignment, but in American usage is likely related to
    keep it 100, a phrase meaning to keep it real, or “be honest/authentic”.
                                                                                              9




Labelling sentiment of Tweets

Kralj-Novak et al. [3] provide frequency information for the annotation of tweets
containing emoji in 13 languages as “negative”, “neutral”, or “positive” by manual
annotators.5 Examples of emoji with positive and negative sentiment values are
shown in Table 2: Faces expressing, for example, indifference or anger have negative
values, while faces expressing affection, flowers, or gifts have positive values. 6 From
this lexicon, scores for individual emoji were calculated by subtracting the number of
negatively-evaluated sentences containing a particular emoji from the number of posi-
tively evaluated sentences with the same type and dividing by the total number of
occurrences of that emoji.

            Table 2. Examples of emoji with negative and positive sentiment ratings

                       Emoji     sentiment     Emoji          sentiment
                                 -0.56                        0.96
                                          
                                 -0.48                        0.81
                                          
                                 -0.45                        0.80
                                          
                                 -0.4                         0.77
                                          
                                 -0.39                        0.76
                                          

Emoji with at least 50 occurrences in the Kralj-Novak data were used to evaluate the
sentiment of the 653.5 million tweets in the experimental data. Tweets were cleaned
of usernames, hashtags and urls, converted to lower case, and tokenized using the
NLTK Twitter Tokenizer [22], the Jieba tokenizer for Mandarin [23] and the Tiny
Segmenter for Japanese [24], then scored using the Kralj-Novak sentiment scale. Ex-
amples are shown in Fig. 6.7




5
    http://kt.ijs.si/data/Emoji_sentiment_ranking/
6
    The “Japanese Dolls” emoji represents figures used in a traditional Japanese observance.
7
    Negative emoji are less frequently used in the data. Negative sentiment, in general, is less
    frequently expressed [25].
10




    Fig. 6. Four tweets in which country of origin, automatically detected language, original text,
               calculated emoji-sentiment value, and tokens after cleaning are shown.


Correlation of Skin Tone and Sentiment

Mean skin tone per tweet was calculated by assigning the values 1 to 5 to the skin
tone emoji, then for each tweet, dividing the sum of the skin tone emoji values by the
number of skin tone emoji.8 Mean skin tone and sentiment were correlated in all
tweets containing at least one skin tone emoji for the entire data and at coun-
try/territorial level by using Pearson’s product-moment correlation. For the entire
data, a weak negative correlation between sentiment and skin tone values was found
(r = -0.09, df = 13,736,953, p < 10-32).

   At the level of country/territory, the correlation between mean sentiment and mean
skin tone value was more strongly negative, at r = -0.25 (df = 235, p = 0.000076) for
the 237 countries or territories with at least one skin tone emoji (Fig. 7).




8
     The values assigned were: light skin tone = 1, medium-light skin tone = 2, medium skin tone
     = 3, medium-dark skin tone = 4, dark skin tone = 5.
                                                                                     11




Fig. 7. Correlation between Mean Tweet Sentiment and Mean Tweet Skin Tone for 237 Coun-
                   tries/Territories (Shaded Area = 95% Confidence Interval)




    To mitigate the effects small sample size (e.g. for countries in which only one or a
few users contributed most or all of the skin-tone emoji), the model was refitted for
the 50 countries/territories with the highest number of tweets (Fig. 8), with the result
that the negative relationship between mean tweet sentiment and mean tweet skin tone
strengthened to r = -0.28 (df = 48, p = 0.051).
12




 Fig. 8. Correlation between Mean Tweet Sentiment and Mean Tweet Skin Tone for 50 Coun-
      tries with the Largest Number of Tweets (Shaded Area = 95% Confidence Interval)

   The countries with lighter mean skin tone values, such as Egypt, Paraguay, Qatar,
Indonesia, or the UAE, have higher mean sentiment scores. European countries, with
mean skin tone values ranging from approximately 1.7 to 2.25, have middling senti-
ment values that fall within the 95% confidence interval. The countries with high
mean skin tone values, such as the United States, Nigeria, or Kenya, have lower mean
sentiment values.


5      Summary and Discussion

Since their introduction into the Unicode scheme in 2015, skin tone emoji have be-
come a widely used resource on the Twitter platform. Their global distribution, se-
mantic properties, and patterning with tweet sentiment were investigated in a large
corpus of tweets containing geographical metadata by using word embeddings and an
emoji sentiment lexicon [3]. Some caveats apply when using the Kralj-Novak et al.
lexicon: The number of emoji in the Unicode Standard has increased considerably
                                                                                      13


since creation of the resource. So, for example, an emoji such as            (U+1F595
REVERSED HAND WITH MIDDLE FINGER EXTENDED), introduced with Emoji
7.0 in 2014, does not have a value in the classification scheme, although it is likely to
be used to mark negative affect. Likewise, emoji introduced since 2015, such as se-
quences consisting of two or more code points joined together (used to represent e.g.
flags or groups of persons), are not in the lexicon, and nor are the skin tone emoji
themselves. Nonetheless, the broad coverage of the lexicon, in which most of the
emoji which can be paired with skin tone emoji are assigned a sentiment value, makes
sentiment inference of tweets containing skin tone emoji feasible. The geographical
distribution of skin tone emoji, their semantics, and the correlation of skin tone and
sentiment suggest several preliminary interpretations.

    The global distribution of skin tone emoji shows that to a certain extent, the char-
acters are being used on Twitter as intended by the originators of the Unicode pro-
posal: to make it possible for people to represent their own skin color in online com-
munication. While darker skin tone emoji are used in Africa and the United States,
lighter skin tones are more numerous globally and are more likely to be used in Asia,
the Middle East, and parts of Latin America and Europe. For the U.S., the prevalence
of darker skin tone emoji may in part be explained by the popularity of Twitter among
African-Americans, who are overrepresented on the platform compared to their share
of the population [16]. For Europe, darker emoji skin tones may indicate a youthful
Twitter user population: in general, younger users are more likely to utilize non-
standard linguistic resources such as emoji on CMC [26], and in Europe younger
people (presumably including some Twitter users) are more likely to come from im-
migrant backgrounds. In Asia, the Middle East, and parts of Latin America, lighter
skin tone emoji may reflect cultural norms concerning the physical attributes of health
and beauty in which skin color can play an important role [27], a fact that has been
documented in research into body satisfaction [28], attractiveness ratings [29], or use
of skin whitening products, particularly by females [30].

   The semantics of skin tone emoji are, in part, manifest in a multidimensional vec-
tor space model. Lighter skin tone emoji are more similar in their collocational prop-
erties (and hence semantics) to other emoji that can be interpreted as expressing gen-
erally positive affect, such as smiling faces and heart symbols, while darker skin tone
emoji are more closely associated with symbols that express other affective states,
including distress, as well as non-standard word forms. The finding corresponds to
that of Ljubešić and Fišer [10], who do not investigate sentiment or emoji skin tone
directly, but cluster countries based on their emoji distributions. They note that the
two darkest skin-tone values, as well as several emoji depicting unhappy faces, be-
long to a “fourth world” cluster of mainly African countries.

   The association between darker emoji skin tone and (possible) negative affect is al-
so manifest when sentiment and mean emoji skin tone and sentiment are regressed.
The negative relationship between sentiment and skin tone is weak when all tweets
with skin tone emoji are considered, having a value of r = -0.09. Because there are so
14


many more skin tone emoji tweets in the data from the United States than from other
countries, the value is almost the same as that for the United States data alone. When
considered at country- or territorial level, however, the value is more strongly nega-
tive at r = -0.25, increasing in strength to r = -0.28 when only the 50 countries with
the most tweets are considered. The negative association between sentiment and
emoji skin tone in this data is in accord with the implicit findings of the Ljubešić and
Fišer study, and parallels the results of survey-based measures of happiness by coun-
try, in which many countries of the developing world, primarily in Africa, report low
levels of well-being and happiness [31].

    The large amount of data collected in this study makes more specific country-level
analyses of skin tone emoji use possible. Considering the fact that language- and ge-
ography-based differences in emoji usage and meaning have been found [8], the se-
mantics of skin tone emoji in particular languages, countries, or geographical contexts
could be more closely examined using vector spaces. Other future work could include
updates and refinements to the emoji sentiment lexicon, as well as the utilization of
more sophisticated sentiment models based on machine learning, support vector ma-
chines, or neural networks. Parsed data containing skin tone emoji could be analyzed
to consider evaluative use of skin tone emoji.

   As skin tone emoji continue to gain in popularity worldwide, techniques for meas-
uring and evaluating the ways in which they are used are likely to play a role in NLP
tasks pertaining to information extraction in bi- and multilingual contexts. In a broad-
er perspective, the analysis of skin tone emoji use can give insight into on how hu-
mans represent themselves on social media, what kinds of attitudes and meanings are
associated with skin color, and how language is used to depict the phenotypical diver-
sity of the shared human condition.


Acknowledgement

The author thanks Finland’s Centre for Scientific Computing (CSC) for providing
access to computational and data storage facilities.


6      References
 1. Davis, M. and Edberg, P.: Unicode emoji (Unicode Technical Standard #51) (2015).
 2. Dashtipour, K., Poria, S., Hussain, A., Cambria, E., Hawalah, A. Y. A., Gelbukh, A.,
    Zhou, Q.: Multilingual sentiment analysis: State of the art and independent comparison of
    techniques. Cognitive Computation (8), 757–771 (2016).
 3. Kralj-Novak, P., Smailović, J., Sluban, B., and Mozetič, I.: Sentiment of emojis. PLoS
    ONE 10(12) (2015).
 4. Bamman, D, Eisenstein, J., Schnoebelen, T.: Gender identity and lexical variation in social
    media. Journal of Sociolinguistics (18), 135–160 (2014).
 5. Schnoebelen, T.: Do you smile with your nose? Stylistic variation in Twitter emoticons.
    University of Pennsylvania Working Papers in Linguistics 18(2), 115–125 (2012).
                                                                                             15


 6. Coats, S.: Grammatical feature frequencies of English on Twitter in Finland. In: Squires,
    L. (ed.), English in computer-mediated communication: Variation, representation, and
    change, pp. 179–210. De Gruyter Mouton, Berlin (2016).
 7. Pavalanathan, U., Eisenstein, J.: Emoticons vs. emojis on Twitter: A casual inference ap-
    proach. arXiv: 1510.080480v1 [cs.CL] (2015).
 8. Barbieri, F., Kruszewski, G., Ronzano, F., Saggion, H.: How cosmopolitan are emojis?
    Exploring emojis usage and meaning over different languages with distributional seman-
    tics. MM '16 Proceedings of the 2016 ACM on Multimedia Conference, Amsterdam, the
    Netherlands — October 15–19, 2016, pp. 531–535. ACM, New York (2016).
 9. McGill, A.: Why White people don’t use white emoji. The Atlantic.
    https://www.theatlantic.com/politics/archive/2016/05/white-people-dont-use-white-
    emoji/481695/ (2016).
10. Ljubešić, N., Fišer, D.: A Global Analysis of Emoji Usage. In: Proceedings of the 10th
    Web as Corpus Workshop (WAC-X) and the EmpiriST Shared Task, pp. 82–89, Berlin,
    Germany, August 7–12, 2016. Association for Computational Linguistics, Stroudsburg,
    PA (2016).
11. Pak, A., Paroubek, P.: Twitter as a corpus for sentiment analysis and opinion mining. In:
    Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M.,
    Tapias., D. (eds.), Proceedings of LREC, pp. 1320–1326 (2010).
12. Kolchyna, O., Souza, T., Treleaven, P., Aste, T.: Twitter sentiment analysis: Lexicon
    method, machine learning method and their combination. arXiv: 1507.00995v3 [cs.CL]
    (2015).
13. Mozetič, I., Grčar, M., Smailović, J.: Multilingual Twitter sentiment classification: The
    role of human annotators. PLoS ONE11(5) (2016).
14. Tang, D., Wei, F., Qin, B., Zhou, M., Liu, T.: Building a large-scale Twitter-specific sen-
    timent lexicon: A representation learning approach. In: COLING 2014, pp. 172–182
    (2014).
15. Jiang, F., Liu, Y.-Q., Luan, H.-B., Sun, J.-S., Zhu, X., Zhang, M., Ma, S.-P.: Microblog
    sentiment analysis with emotion space model. Journal of Computer Science and Technolo-
    gy 30(5), 1120–1129 (2015).
16. Duggan, M.: Mobile messaging and social media 2015. Report, Pew Research Center
    (2015).
17. Chaplin, G.: Geographic distribution of environmental factors influencing human skin col-
    oration. American Journal of Physical Anthropology 125, 292–302 (2004).
18. Firth, J. R. Papers in linguistics, 1934–1951. Oxford University Press, London (1957).
19. Harris, Z. Mathematical structures of language. Interscience, New York (1968).
20. Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word repre-
    sentations. In: Proceedings of HLT-NAACL 13, pp. 746–751 (2013).
21. Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In:
    Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp.
    45–50 (2010).
22. Bird, S., Loper, E., Klein, E.: Natural language processing with Python. O'Reilly Media
    Inc., Newton, MA (2009).
23. Sun, J.: Jieba: Chinese word segmentation module. https://github.com/fxsjy/jieba (2014).
24. Hagiwara,       M.:       Tinysegmenter:      Tokenizer     specified     for      Japanese.
    https://github.com/SamuraiT/tinysegmenter (2014).
25. Dodds, P., Clark, E., Desu, S., Frank, M., Reagan, A., Williams, J.: Human language re-
    veals a universal positivity bias. PNAS 112(8), 2389–2394 (2015).
16


26. Pavalanathan, U., Eisenstein, J. Confounds and consequences in geotagged Twitter data.
    arXiv:1506.02275v2 [cs.CL] (2016).
27. Li, E., Min, H., Belk, R., Kimura, J., Bahl, S.: Skin lightening and beauty in four Asian
    cultures. In: Lee, A., Soman, D. (eds.), Advances in Consumer Research Volume 35, pp.
    444–449. Association for Consumer Research, Duluth, MN (2008).
28. Sahay S., Piran, N.: Skin-color preferences and body satisfaction among South Asian-
    Canadian and European-Canadian female university students. Journal of Social Psycholo-
    gy 137(2), 161–171 (1997).
29. Swami, V., Furnham, A., Joshi, K.: The influence of skin tone, hair length, and hair colour
    on ratings of women's physical attractiveness, health and fertility. Scandinavian Journal of
    Psychology 49, 429–437 (2008).
30. Peltzer, K., Pengpid, D., James, C.: The globalization of whitening: prevalence of skin
    lighteners (or bleachers) use and its social correlates among university students in 26 coun-
    tries. International Journal of Dermatology 55(2), 165–172 (2016).
31. Helliwell, J. F., Huang, H., Wang, S.: The social foundations of world happiness. In: Hel-
    liwell, J. F., Layard, R., Sachs, J. (eds.). Word Happiness Report 2017. Columbia Univer-
    sity Center for Sustainable Development, New York (2017).