<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The UNIBA System at the EVALITA 2018 Italian Emoji Prediction Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lucia Siciliani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniela Girardi</string-name>
          <email>daniela.girardig@uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Bari Aldo Moro Via</institution>
          ,
          <addr-line>E. Orabona, 4 - 70125 Bari</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>English. This paper describes our participation in the ITAmoji task at EVALITA 2018 (Ronzano et al., 2018). Our approach is based on three sets of features, i.e. micro-blog and keyword features, sentiment lexicon features and semantic features. We exploit these features to train and combine several classifiers using different libraries. The results show how the selected features are not appropriate for training a linear classifier to properly address the emoji prediction task.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>1 Introduction
Nowadays, emojis are widely used to express
sentiments and emotions in written communication,
which is becoming more and more popular due to
the increasing use of social media. In fact,
emojis can help the user to express and codify many
different messages which can be also easily
interpreted by a great audience since they are very
intuitive. However, sometimes happens that their
meaning is misleading, resulting in the
misunderstanding of the entire message. The emoji
detection has captured the interest of research since they
could be relevant to improve sentiment analysis
and user profiling tasks as well as the retrieval of
social network material.</p>
      <p>
        In particular, in the context of the International
Workshop on Semantic Evaluation (SemEVAL
2018), the Multilingual Emoji Prediction Task
        <xref ref-type="bibr" rid="ref2 ref8">(Barbieri et al., 2018)</xref>
        has been proposed for
challenging the research community to automatically
model the semantics of emojis occurring in
English and Spanish Twitter messages. During this
challenge,
        <xref ref-type="bibr" rid="ref1">(Barbieri et al., 2017)</xref>
        created a model
which outperforms humans in predicting the most
probable emoji associated with a given tweet.
      </p>
      <p>Twitter supports more than 1.000 emojis1,
belonging to different categories (e.g.: smiley and
people, animals, fruits, etc.) and this number
seems to grow.</p>
      <p>
        In this paper, we used a set of features which
showed promising results in predicting sentiment
polarity in tweets
        <xref ref-type="bibr" rid="ref3">(Basile and Novielli, 2014)</xref>
        in
order to understand whether they could be used
also to predict emoji or not. The paper is
organized as follow: Section 2 describes the system
and the exploited features, while in Section 3 we
report the obtained results using different
classifiers and their ensemble. Finally, in Section 4 we
discuss our findings and Section 5 reports the
conclusions.
In this section, we describe the approach used for
solving the ITAmoji challenge. This task is
structured as a multi-class classification since for each
tweet it is possible to assign one of 25 emoji which
however are mutually exclusive.
      </p>
      <p>The feature extraction was performed entirely
using the language Java. First of all, each tweet
was tokenized and stop-words were removed
exploiting the “Twitter NLP and Part-of-Speech
Tagging” API2 developed by the Carnegie Mellon
University. No other NLP steps, like stemming
or PoS-tagging where considered since those
features were considered not relevant for this
particular kind of task.</p>
      <p>Then we moved to the extraction of the features
from the training data. These features can be
categorized into three sets: one addressing the
keywords and micro-blog features, the second one
exploiting the polarity of each word in a semantic
lexicon and the third one using their representation
obtained through a distributional semantic model.</p>
      <p>A description of the different sets of features will
be provided in Section 2.1.</p>
      <p>
        After the features extraction, we obtained a total
set of 342 features to be used to train a linear
classifier. For classification, we decided to exploit the
Weka API3 and use an ensemble of three different
classifiers to obtain better predictive results. The
three classifiers that have been used are: the
L2regularized L2-loss support vector classification,
the L2-regularized logistic regression, and the
random forest classifier. The first two algorithms
are based on the WEKA wrapper class for the
Liblinear classifier
        <xref ref-type="bibr" rid="ref6">(Fan et al., 2008)</xref>
        and were
trained on the whole set of features, while the
random forest was trained only over the keyword
and micro-blog features. All the classifiers were
combined using the soft-voting technique, which
averages the sum of the output of each classifier
over their overall number.
      </p>
      <p>
        In the light of the results of the task given by the
organizers, we conducted an in-depth analysis of
our solution and discovered that due to a problem
in the Liblinear WEKA wrapper, not all the
classifiers returned a set of probability scores for
multiclass classification thus compromising the results
of all the ensemble. Therefore, even if out of the
time scope of this challenge, we decided to try to
use the scikit-learn
        <xref ref-type="bibr" rid="ref4">(Buitinck et al., 2013)</xref>
        to build
our classifiers and evaluate the impact of the
selected features.
      </p>
      <p>All the results will be summarized and
discussed in Section 3 and Section 4.</p>
      <p>2http://www.cs.cmu.edu/ ark/TweetNLP/
39http://www.cs.waikato.ac.nz/ml/weka/
2.1</p>
      <p>
        Features
As in the previous work of
        <xref ref-type="bibr" rid="ref3">(Basile and Novielli,
2014)</xref>
        , we defined three groups of features based
on (i) keyword and micro-blogging characteristics,
(ii) a sentiment lexicon and (iii) a Distributional
Semantic Model (DSM). Keyword based features
exploit tokens occurring in the tweets,
considering only unigrams. During the tokenization phase
user mentions, URLs and hash-tags are replaced
with three meta-tokens: “USER”, “URL”, and
“TAG”, in order to count them and include their
number as features. Other features connected to
the micro-blogging environment are: the
presence of exclamation and interrogative marks,
adversative, disjunctive, conclusive, and explicative
words, the use of uppercase and informal
expressions of laughter, such as ”ah ah”. The list of
micro-blogging features is reported in 1.
      </p>
      <p>
        The second block of features consists of
sentiment lexicon features. As Italian lexicon database,
we used MultiWordNet
        <xref ref-type="bibr" rid="ref7">(Pianta et al., 2002)</xref>
        ,
where at each lemma is assigned a positive,
negative and neutral score. In particular, we include
features based on the prior polarity of words in the
tweets. To deal with mixed polarity cases we
defined two sentiment variation features so as to
capture the simultaneous expression of positive and
negative sentiment. We decided to include
features related to the polarity of the tweets since
emoji could be intuitively categorized into
positive and negative and are usually used to enforce
the sentiment expressed. The list of sentiment
lexicon features is reported in 2. The last group of
features is the semantic one, which exploits a
Distributional Semantic Model. We used the vector
embeddings for each word and the superposition
operator
        <xref ref-type="bibr" rid="ref9">(Smolensky, 1990)</xref>
        to compute an overall
vector representation of the tweet. Analogously,
we first computed a prototype vector for each
polarity class (positive, negative, subjectivity and
objectivity) as the sum of all the vector
representations of each tweet to a certain class. Finally, we
computed the element-wise minimum and
maximum of the vectors representation of each word in
the tweet and then the resulting vectors were then
concatenated and used as features. This approach
has been proved to work well and easy to compute
for small texts like tweets and other micro-blog
posts
        <xref ref-type="bibr" rid="ref5">(De Boom et al., 2016)</xref>
        . The list of
sentiment lexicon features is reported in 3.
      </p>
      <p>Microblog Description
tag total occurences of hashtags
url total occurrences of URLs
user total occurrences of user mentions
neg count total occurrences of ”non” word pt
exclamation total occurrences of exclamation marks
interrogative total occurrences of interrogative marks
adversative total occurrences of adversative words
disgiuntive total occurrences of disjunctive words
conclusive total occurrences of conclusive words
esplicative total occurrences of esplicative words
uppercase ch number of upper case characters
repeat ch number of consecutive repetitions of a character in a word
ahah repetition total occurrences of ”ahah” laughter expression
The goal of the ITAmoji challenge is to evaluate
the capability of each system to predict the right
emoji associated with a tweet, regardless of its
position in the text.</p>
      <p>Organizers selected a subset of 25 emojis and
provided 250,000 tweets for training, each tweet
contains only one emoji which is extracted from
text and given as a target feature. The
training set is very unbalanced since three emojis
(i.e.: read heart , face with tears of joy , and
smiling face with heart eyes ) represent almost
50% of the whole dataset.</p>
      <p>For the evaluation instead, the organizers
created a test set made up of 25,000 tweets, keeping
unchanged the ratio of the different classes over
the whole set. The prediction for each tweet is
composed by the list of all the 25 emojis ordered
by their probability to be associated to the tweet:
in this way, it is possible to evaluate the systems
according to their accuracy up to a certain
position in the rank. Nevertheless only the first emoji
one was mandatory for the submission.</p>
      <p>Systems were ranked according to the macro
F-Measure but also other metrics have been
calculated, i.e. the micro F-measure, the weighted
F-measure, the coverage error and the accuracy
(measured @5, @10, @15 and @20). The final
results for the challenge are reported in table 5. We
can see how while there is quite a difference
between the results obtained for the macro-F1 score,
the same does not happen with the micro F1 score.</p>
      <p>The same happens with the outcomes of the
accuracy where, setting aside two runs, all the other
obtain a result which is included between 0,5 and
0,8. In other words, even if the macro-F1
measure appears to be the most discriminating factor
among all the runs, such a result is based on the
presence of some classes which appear over a
numerous amount of instances and this causes the
classifiers to overfit over them.</p>
      <p>Table 6 summarizes the results obtained using
both WEKA (the one which was submitted,
highlighted in italic) and scikit-learn. We used the
scikit-learn library to perform a classification
using the logistic regression and then adding, using
a soft voting technique, a Naive-Bayes classifier
and a Random Forest (rows 4 and 5 respectively).</p>
      <p>
        From these results we can see how, independently
from the used classifier, the final results in terms
of the metrics used for the evaluation over the test
dataset stay quite similar among them.
Specifically, these results depends on the fact that our
system predicts only two label as first which are
”red heart” and ”face whit tears”, resulting
unable to classify correctly the other classes, as is
shown in table 4. This outcome is then probably
due to the set of features that we used, which does
not manage to appropriately model the data in this
task, even if it proved to be successful in another
sentiment analysis context
        <xref ref-type="bibr" rid="ref3">(Basile and Novielli,
2014)</xref>
        . In the last column of table 6, we reported
the average macro-F1 obtained performing 5-fold
cross validation. The value for the first evaluation
has not been calculated since the fault in the
library described in section 2.
The overall results of the challenge show how this
task is non-trivial and difficult to solve with high
precision and the reason behind this is intrinsic to
the task itself. First of all, there are several emojis
which often differ only slightly from each other,
furthermore, this meaning is deeply dependent on
the single user and from the context. In fact, a
single emoji (like ) could be used to convey both
joy and fun or, on the contrary, it could also be
used ironically with a negative meaning. To this
extent, an interesting update for the task could be
to leave the text of the tweet as it is so that the
position could be also exploited to detect irony and
other variations.
      </p>
      <p>From the analysis of the overall results of the
task emerged that there is a large gap between
the macro-F1 scores which is not reflected by
the micro-F1. For this particular task, where
both training and testing dataset are heavily
unbalanced, we think that the micro-F1 score is more
suited to capture the performance of the submitted
systems since it takes into account the support of
each class.</p>
      <p>There is a result which is particularly
interesting that is, the value for the 5-fold using only the
logistic regression as a classifier which is
particularly high (0,358) and is opposing to the final
score. This aspect surely needs further
investigathe similarity between ~t and the negative prototype vector p~s
the similarity between ~t and the positive prototype vector p~s
the similarity between ~t and the subjective prototype vector p~s
the similarity between ~t and the objective prototype vector p~s
the element-wise minimum of the vectors representations of each word in the tweet
the element-wise maximum of the vectors representations of each word in the tweet
In this paper, we presented our contribution to the
ITAmoji task of the EVALITA 2018 campaign.</p>
      <p>We tried to model the data by extracting features
based on the keywords and micro-blogging
characteristics, using a sentiment lexicon and finally
using word embeddings. Apart from the
characteristics of the different libraries available for
machine learning purposes, the results show how,
independently from the classifier, those features
do not adapt to this problem. As future work,
this analysis could also be extended with an
ablation which would allow understanding if there are
noisy features.
label
beaming face with smiling eyes
blue heart
face blowing a kiss
face savoring food
face screaming in fear
face with tears of joy
flexed biceps
grinning face
grinning face with sweat
kiss mark
loudly crying face
red heart
rolling on the floor laughing
rose
smiling face with heart eyes
smiling face with smiling eyes
smiling face with sunglasses
sparkles
sun
thinking face
thumbs up
top arrow
two hearts
winking face
winking face with tongue
avg / total
0,000
0,500
0,500
0,000
0,000
0,313
0,000
0,000
0,000
0,000
0,000
0,259
0,000
0,125
0,135
0,167
0,000
0,000
0,000
0,000
0,000
0,000
0,000
0,000
0,000
0,164
support
1028
4966
506
834
387
444
417
885
379
279
373
5069
546
265
2363
1282
700
266
319
541
642
347
341
1338
483
25000
precision
recall
0,000
0,002
0,002
0,000
0,000
0,448
0,000
0,000
0,000
0,000
0,000
0,909
0,000
0,004
0,004
0,000
0,000
0,000
0,000
0,000
0,000
0,000
0,000
0,000
0,000
0,274
0,000
0,004
0,005
0,000
0,000
0,369
0,000
0,000
0,000
0,000
0,000
0,403
0,000
0,007
0,008
0,002
0,000
0,000
0,000
0,000
0,000
0,000
0,000
0,000
0,000
0,156</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Barbieri</surname>
          </string-name>
          , Miguel Ballesteros, and
          <string-name>
            <given-names>Horacio</given-names>
            <surname>Saggion</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Are emojis predictable? arXiv preprint</article-title>
          arXiv:
          <volume>1702</volume>
          .
          <fpage>07285</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Barbieri</surname>
          </string-name>
          , Jose Camacho-Collados, Francesco Ronzano, Luis Espinosa Anke, Miguel Ballesteros, Valerio Basile, Viviana Patti, and
          <string-name>
            <given-names>Horacio</given-names>
            <surname>Saggion</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Semeval 2018 task 2: Multilingual emoji prediction</article-title>
          .
          <source>In Proceedings of The 12th International Workshop on Semantic Evaluation</source>
          , pages
          <fpage>24</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          and
          <string-name>
            <given-names>Nicole</given-names>
            <surname>Novielli</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Uniba at evalita 2014-sentipolc task: Predicting tweet sentiment polarity combining micro-blogging, lexicon and semantic features</article-title>
          .
          <source>Proceedings of EVALITA</source>
          , pages
          <fpage>58</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Lars</given-names>
            <surname>Buitinck</surname>
          </string-name>
          , Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          , Alexandre Gramfort, Jaques Grobler, Robert Layton,
          <string-name>
            <surname>Jake</surname>
            <given-names>VanderPlas</given-names>
          </string-name>
          , Arnaud Joly, Brian Holt, and Gae¨l Varoquaux.
          <year>2013</year>
          .
          <article-title>API design for machine learning software: experiences from the scikit-learn project</article-title>
          .
          <source>In ECML PKDD Workshop: Languages for Data Mining and Machine Learning</source>
          , pages
          <fpage>108</fpage>
          -
          <lpage>122</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Cedric De Boom</surname>
          </string-name>
          , Steven Van Canneyt,
          <string-name>
            <surname>Thomas Demeester</surname>
            , and
            <given-names>Bart</given-names>
          </string-name>
          <string-name>
            <surname>Dhoedt</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Representation learning for very short texts using weighted word embedding aggregation</article-title>
          .
          <source>Pattern Recognition Letters</source>
          ,
          <volume>80</volume>
          :
          <fpage>150</fpage>
          -
          <lpage>156</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Rong-En</surname>
            <given-names>Fan</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kai-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho-Jui</surname>
            <given-names>Hsieh</given-names>
          </string-name>
          , XiangRui Wang, and
          <string-name>
            <surname>Chih-Jen Lin</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Liblinear: A library for large linear classification</article-title>
          .
          <source>Journal of machine learning research</source>
          ,
          <volume>9</volume>
          (Aug):
          <fpage>1871</fpage>
          -
          <lpage>1874</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Emanuele</given-names>
            <surname>Pianta</surname>
          </string-name>
          , Luisa Bentivogli, and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Girardi</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Multiwordnet: developing an aligned multilingual database</article-title>
          .
          <source>1st gwc. India</source>
          , January.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Ronzano</surname>
          </string-name>
          , Francesco Barbieri, Endang Wahyu Pamungkas, Viviana Patti, and
          <string-name>
            <given-names>Francesca</given-names>
            <surname>Chiusaroli</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the EVALITA 2018 Italian Emoji Prediction (ITAMoji) Task</article-title>
          .
          <source>In Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          ), Turin, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Paul</given-names>
            <surname>Smolensky</surname>
          </string-name>
          .
          <year>1990</year>
          .
          <article-title>Tensor product variable binding and the representation of symbolic structures in connectionist systems</article-title>
          .
          <source>Artificial intelligence</source>
          ,
          <volume>46</volume>
          (
          <issue>1- 2</issue>
          ):
          <fpage>159</fpage>
          -
          <lpage>216</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>