<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ITAmoji 2018: Emoji Prediction via Tree Echo State Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daniele Di Sarli</string-name>
          <email>d.disarli@studenti.unipi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudio Gallicchio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessio Micheli</string-name>
          <email>michelig@di.unipi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science University of Pisa</institution>
          ,
          <addr-line>Pisa</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>English. For the “ITAmoji” EVALITA 2018 competition we mainly exploit a Reservoir Computing approach to learning, with an ensemble of models for trees and sequences. The sentences for the models of the former kind are processed by a language parser and the words are encoded by using pretrained FastText word embeddings for the Italian language. With our method, we ranked 3rd out of 5 teams.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Echo State Networks
        <xref ref-type="bibr" rid="ref13">(Jaeger and Haas, 2004)</xref>
        are
an efficient class of recurrent models under the
framework of Reservoir Computing
        <xref ref-type="bibr" rid="ref16">(Lukosˇevicˇius
and Jaeger, 2009)</xref>
        , where the recurrent part of
the model (“reservoir”) is carefully initialized and
then left untrained
        <xref ref-type="bibr" rid="ref7">(Gallicchio and Micheli, 2011)</xref>
        .
The only weights that are trained are part of a
usually simple readout layer1. Echo State
Networks were originally designed to work on
sequences, however it has been shown how to extend
them to deal with recursively structured data, and
1Trained in closed form, e.g. by Moore-Penrose
pseudoinversion, or Ridge Regression.
20.27%
19.86%
9.45%
5.35%
5.13%
4.11%
3.54%
3.33%
2.80%
2.57%
2.18%
2.16%
2.03%
1.94%
1.78%
1.67%
1.55%
1.52%
1.49%
1.39%
1.37%
1.28%
1.12%
1.07%
1.06%
trees in particular, with Tree Echo State Networks
        <xref ref-type="bibr" rid="ref8">(Gallicchio and Micheli, 2013)</xref>
        , also referred to as
TreeESNs.
      </p>
      <p>
        We follow this approach for solving the
ITAmoji task in the EVALITA 2018 competition
        <xref ref-type="bibr" rid="ref19 ref2">(Ronzano et al., 2018)</xref>
        . In particular, we parse the input
texts into trees resembling the grammatical
structure of the sentences, and then we use multiple
TreeESN models to process the parse trees and
make predictions. We then merge these models by
using an ensemble to make our final predictions.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Task and Dataset</title>
      <p>Given a set of Italian tweets, the goal of the
ITAmoji task is to predict the most likely emoji
associated with each tweet. The dataset contains
250,000 tweets in Italian, each of them originally
containing only one (possibly repeated) of the 25
emojis considered in the task (see Figure 1). The
emojis are removed from the sentences and used
as targets.</p>
      <p>The test dataset contains 25,000 tweets
similarly processed.</p>
    </sec>
    <sec id="sec-3">
      <title>Preprocessing</title>
      <p>The provided dataset has been shuffled and split
into a training set (80%) and a validation set
(20%).</p>
      <p>
        We preprocessed the data by first
removing any URL from the sentences, as most of
them did not contain any informative content
(e.g. “https://t.co/M3StiVOzKC”). We
then parsed the sentences by using two different
parsers for the Italian language: Tint2
        <xref ref-type="bibr" rid="ref14 ref18">(Palmero
Aprosio and Moretti, 2016)</xref>
        and spaCy
        <xref ref-type="bibr" rid="ref12 ref15">(Honnibal and Johnson, 2015)</xref>
        . This produced two sets
of trees, both including information about the
dependency relations between the nodes of each tree.
We finally replace each word with its
corresponding pretrained FastText embedding
        <xref ref-type="bibr" rid="ref14">(Joulin et al.,
2016)</xref>
        .
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Description of the system</title>
      <p>Our ensemble is composed by 13 different
models, 12 of which are TreeESNs and the other one
is a Long Short-Term Memory (LSTM) over
characters. Different random initializations (“trials”)
of the model parameters are all included in the
ensemble in order to enrich the diversity of the
hypotheses. We summarize the entire configuration
in Table 1.
4.1</p>
      <sec id="sec-4-1">
        <title>TreeESN models</title>
        <p>The TreeESN that we are using is a specialization
of the description given by Gallicchio and Micheli
(2013), and the reader can refer to that work for
additional details. Here, the state corresponding
to node n of an input tree t is computed as:
x(n) = f
k !
Winu(n) + 1 X W^ inx(chi(n)) ;
k i=1
(1)
where u(n) is the label of node n in the input
tree, k is the number of children of node n, chi(n)
is the i-th child of node n, Win is the
input-toreservoir weight matrix, W^ in is the recurrent
reservoir weight matrix associated to the grammatical
relation between node n and its i-th child, and f
is the element-wise applied activation function of
the reservoir units (in our case, it is a tanh). All
matrices in Equation 1 are left untrained.</p>
        <p>
          2Emitting data in the CoNLL-U format
          <xref ref-type="bibr" rid="ref17">(Nivre et al.,
2016)</xref>
          , a revised version of the CoNLL-X format
          <xref ref-type="bibr" rid="ref4">(Buchholz
and Marsi, 2006)</xref>
          .
        </p>
        <p>Note that Equation 1 determines a recursive
application (bottom-up visit) over each node of the
tree t until the state for all nodes is computed,
which we can express in structured form as x(t).
The resulting tree x(t) is then mapped into a
fixedsize feature representation via the state mapping
function. We make use of mean and sum state
mapping functions, respectively yielding the mean
and the sum of all the states. The result, (x(t)),
is then projected into a different space by a matrix
W :
y^ = f (W
(x(t))) ;
(2)
where f is an activation function.</p>
        <p>
          For the readout we use both a linear regression
approach with L2 regularization known as Ridge
regression
          <xref ref-type="bibr" rid="ref11">(Hoerl and Kennard, 1970)</xref>
          and a
multilayer perceptron (MLP):
y = readout(y^);
(3)
where y 2 R25 is the output vector, which
represents a score for each of the classes: the
index with the highest value corresponds to the most
likely class.
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>CharLSTM model</title>
        <p>
          The CharLSTM model uses a bidirectional LSTM
          <xref ref-type="bibr" rid="ref10 ref9">(Hochreiter and Schmidhuber, 1997; Graves and
Schmidhuber, 2005)</xref>
          with 2 layers, which takes as
input the characters of the sentences expressed as
pretrained character embeddings of size 300. The
LSTM output is then fed into a linear layer with
25 output units.
        </p>
        <p>Similar models have been used in recent works
related to emoji prediction, see for example the
model used by Barbieri et al. (2017), or the one
by Baziotis et al. (2018), which is however a more
complex word-based model.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Ensemble</title>
        <p>We take into consideration two different
ensembles, both containing the models in Table 1, but
with different strategies for weighting the NP
predictions. In the following, let Y 2 RNP 25 be the
matrix containing one prediction per row.</p>
        <p>The weights for the first ensemble
(corresponding to the run file run1.txt) have been produced
by a random search: at each iteration we
compute a random vector w 2 RNP with entries
sampled from a random variable W 2, W U [0; 1].
The square increases the probability of sampling</p>
        <p>Trials
10
10
1
2
1
1
1
3
1
3
1
2
1
near-zero weights. After selecting the best
configuration on the validation set, the predictions
from each of the models are merged together in
a weighted mean:
y = wY
(4)</p>
        <p>For the second type of ensemble
(corresponding to the run file run2.txt) we adopt a
multilayer perceptron. We feed as input the NP
predictions concatenated into a single vector y(1:::NP ) 2
R25NP , so that the model is:
y = tanh y(1:::NP )W1 + b1
W2 + b2; (5)
where the hidden layer has size 259 and the
output layer is composed by 25 units.</p>
        <p>In both types of ensemble, as before, the
output vector contains a score for each of the classes,
providing a way to rank them from the most to the
least likely. The most likely class c~ is thus
computed as c~ = arg max yi.</p>
        <p>i
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Training</title>
      <p>The training algorithm differs based on the kind of
model taken under consideration. We address each
of them in the following paragraphs.</p>
      <p>
        Models 1-6 The first six models are TreeESNs
using a multilayer perceptron as readout. Given
the fact that the main evaluation metric for the
competition is the Macro F-score, each of the
models has been trained by rebalancing the
frequencies of the different target classes. In
particular, the sampling probability for each input tree
has been skewed so that the data extracted
during training follows a uniform distribution with
respect to the target class. For the readout part we
use the Adam algorithm
        <xref ref-type="bibr" rid="ref12 ref15">(Kingma and Ba, 2015)</xref>
        for the stochastic optimization of the multi-class
cross entropy loss function.
      </p>
      <p>Models 7-10 Models from 7 to 10 are again
TreeESNs, but with a Ridge Regression
readout. In this case, 25 classifiers are trained with
a 1-vs-all method, one for each class, using binary
targets.</p>
      <p>Models 11-12 Models 11 and 12 are again
TreeESNs with a Ridge Regression readout, but
they are trained to distinguish only between the
most frequent class, the second most frequent
class and all the other classes aggregated together.
This is done to try to improve the ensemble
precision and recall for the top two classes.</p>
      <p>Model 13 The last model is a sequential LSTM
over character embeddings. Like in the first 6
models, the Adam algorithm is used to optimize
the cross entropy loss function.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Results</title>
      <p>The ensemble seems to bring a substantial
improvement to the performance on the validation
set, as highlighted in Table 2. This is possible
thanks to the number and diversity of the
different models, as we can see in Figure 2 where we
show the Pearson correlation coefficients between
the predictions of the models in the ensemble.</p>
      <p>On the test set we scored substantially lower,
1
2
2
3
4
5
6
7
8
8
9
0
1
1
1
3
1
0.5
1
0</p>
      <p>
        5
le10
b
a
l
e
ru15
T
20
25
5
5
0.8
0.6
0.4
with the Macro-F1 and Coverage Errors reported
in Table 3. These numbers are close to those
obtained by the top two models applied to the
Spanish language in the “Multilingual Emoji
Prediction” task of the SemEval-2018 competition
        <xref ref-type="bibr" rid="ref19 ref2">(Barbieri et al., 2018)</xref>
        , with F1 scores of 22.36 and
18.73
        <xref ref-type="bibr" rid="ref5 ref6 ref6">( C¸o¨ltekin and Rama, 2018; Coster et al.,
2018)</xref>
        . In Figure 3 we report the confusion matrix
(with values normalized over the columns to
address label imbalance) and the accuracy over the
top-N classes.
      </p>
      <p>An interesting characteristic of this approach,
though, is computation time: we were able to train
a TreeESN with 5000 reservoir units over 200,000
trees in just about 25 minutes, and this is without
exploiting parallelism between the trees.</p>
      <p>In ITAmoji 2018, our team ranked 3rd out of
5. Detailed results and rankings are available at
http://bit.ly/ITAmoji18.
7</p>
    </sec>
    <sec id="sec-7">
      <title>Discussion and conclusions</title>
      <p>Different authors have highlighted the difference
in performance between SVM models and (deep)
neural models for emoji prediction, and more in
general for text classification tasks, suggesting that
simple models like SVMs are more able to
capture the features which are most important for
generalization: see for example the reports of
the SemEval-2018 participants C¸o¨ltekin and Rama
(2018) and Coster et al. (2018).</p>
      <p>In this work, instead, we approached the
problem from the novel perspective of reservoir
computing applied to the grammatical tree structure of
the sentences. Despite a significant performance
drop on the test set3 we showed that, paired with
a rich ensemble, the method is comparable to the
results obtained in the past by other participants in
similar competitions using very different models.</p>
      <p>3Probably due to overtraining: we observed that
MacroF1 overcame 0.40 in training.
10
15
20
25</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Barbieri</surname>
          </string-name>
          , Miguel Ballesteros, and
          <string-name>
            <given-names>Horacio</given-names>
            <surname>Saggion</surname>
          </string-name>
          .
          <year>2017</year>
          . Are Emojis Predictable? arXiv preprint arXiv:
          <volume>1702</volume>
          .
          <fpage>07285</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Barbieri</surname>
          </string-name>
          , Jose Camacho-Collados, Francesco Ronzano, Luis Espinosa Anke, Miguel Ballesteros, Valerio Basile, Viviana Patti, and
          <string-name>
            <given-names>Horacio</given-names>
            <surname>Saggion</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>SemEval 2018 Task 2: Multilingual Emoji Prediction</article-title>
          .
          <source>In Proceedings of The 12th International Workshop on Semantic Evaluation</source>
          , pages
          <fpage>24</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Christos</given-names>
            <surname>Baziotis</surname>
          </string-name>
          , Nikos Athanasiou, Georgios Paraskevopoulos, Nikolaos Ellinas, Athanasia Kolovou, and
          <string-name>
            <given-names>Alexandros</given-names>
            <surname>Potamianos</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>NTUA-SLP at SemEval-2018 Task 2: Predicting Emojis using RNNs with Context-aware Attention</article-title>
          . arXiv preprint arXiv:
          <year>1804</year>
          .06657.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Sabine</given-names>
            <surname>Buchholz</surname>
          </string-name>
          and
          <string-name>
            <given-names>Erwin</given-names>
            <surname>Marsi</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>CoNLL-X shared task on Multilingual Dependency Parsing</article-title>
          .
          <source>In Proceedings of the Tenth Conference on Computational Natural Language Learning</source>
          , pages
          <fpage>149</fpage>
          -
          <lpage>164</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>C¸ag˘rı C¸ o¨ltekin and</article-title>
          <string-name>
            <given-names>Taraka</given-names>
            <surname>Rama</surname>
          </string-name>
          .
          <year>2018</year>
          . Tu¨bingenOslo at SemEval
          <article-title>-2018 Task 2: SVMs perform better than RNNs in Emoji Prediction</article-title>
          .
          <source>In Proceedings of The 12th International Workshop on Semantic Evaluation</source>
          , pages
          <fpage>34</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Joe¨l Coster</surname>
          </string-name>
          ,
          <source>Reinder Gerard Dalen, and Nathalie Adrie¨nne Jacqueline Stierman</source>
          .
          <year>2018</year>
          .
          <article-title>Hatching Chick at SemEval-2018 Task 2: Multilingual Emoji Prediction</article-title>
          .
          <source>In Proceedings of The 12th International Workshop on Semantic Evaluation</source>
          , pages
          <fpage>445</fpage>
          -
          <lpage>448</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Claudio</given-names>
            <surname>Gallicchio</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alessio</given-names>
            <surname>Micheli</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Architectural and Markovian factors of echo state networks</article-title>
          .
          <source>Neural Networks</source>
          ,
          <volume>24</volume>
          (
          <issue>5</issue>
          ):
          <fpage>440</fpage>
          -
          <lpage>456</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Claudio</given-names>
            <surname>Gallicchio</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alessio</given-names>
            <surname>Micheli</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Tree Echo State Networks</article-title>
          .
          <source>Neurocomputing</source>
          ,
          <volume>101</volume>
          :
          <fpage>319</fpage>
          -
          <lpage>337</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Alex</given-names>
            <surname>Graves</surname>
          </string-name>
          and Ju¨rgen Schmidhuber.
          <year>2005</year>
          .
          <article-title>Framewise phoneme classification with bidirectional LSTM networks</article-title>
          .
          <source>In Neural Networks</source>
          ,
          <year>2005</year>
          . IJCNN'
          <fpage>05</fpage>
          .
          <string-name>
            <surname>Proceedings</surname>
          </string-name>
          . 2005 IEEE International Joint conference on, volume
          <volume>4</volume>
          , pages
          <fpage>2047</fpage>
          -
          <lpage>2052</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Sepp</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          and Ju¨rgen Schmidhuber.
          <year>1997</year>
          .
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural computation</source>
          ,
          <volume>9</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Arthur E</given-names>
            <surname>Hoerl and Robert W Kennard</surname>
          </string-name>
          .
          <year>1970</year>
          .
          <article-title>Ridge regression: Biased estimation for nonorthogonal problems</article-title>
          . Technometrics,
          <volume>12</volume>
          (
          <issue>1</issue>
          ):
          <fpage>55</fpage>
          -
          <lpage>67</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Matthew</given-names>
            <surname>Honnibal</surname>
          </string-name>
          and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Johnson</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>An Improved Non-monotonic Transition System for Dependency Parsing</article-title>
          .
          <source>In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <fpage>1373</fpage>
          -
          <lpage>1378</lpage>
          , Lisbon, Portugal, September. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Herbert</given-names>
            <surname>Jaeger</surname>
          </string-name>
          and
          <string-name>
            <given-names>Harald</given-names>
            <surname>Haas</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication</article-title>
          .
          <source>Science</source>
          ,
          <volume>304</volume>
          (
          <issue>5667</issue>
          ):
          <fpage>78</fpage>
          -
          <lpage>80</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Armand</given-names>
            <surname>Joulin</surname>
          </string-name>
          , Edouard Grave, Piotr Bojanowski, and
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Bag of Tricks for Efficient Text Classification</article-title>
          .
          <source>arXiv preprint arXiv:1607</source>
          .
          <fpage>01759</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Diederik P Kingma and Jimmy Lei Ba</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Adam: Amethod for stochastic optimization</article-title>
          .
          <source>In Proceedings of the 3rd International Conference on Learning Representations (ICLR).</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Mantas Lukosˇevicˇius and Herbert Jaeger</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Reservoir computing approaches to recurrent neural network training</article-title>
          .
          <source>Computer Science Review</source>
          ,
          <volume>3</volume>
          (
          <issue>3</issue>
          ):
          <fpage>127</fpage>
          -
          <lpage>149</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Joakim</given-names>
            <surname>Nivre</surname>
          </string-name>
          ,
          <string-name>
            <surname>Marie-Catherine De</surname>
            <given-names>Marneffe</given-names>
          </string-name>
          , Filip Ginter, Yoav Goldberg, Jan Hajic,
          <string-name>
            <surname>Christopher D Manning</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ryan T McDonald</surname>
          </string-name>
          ,
          <string-name>
            <surname>Slav Petrov</surname>
            , Sampo Pyysalo,
            <given-names>Natalia</given-names>
          </string-name>
          <string-name>
            <surname>Silveira</surname>
          </string-name>
          , et al.
          <year>2016</year>
          .
          <article-title>Universal Dependencies v1: A Multilingual Treebank Collection</article-title>
          .
          <source>In LREC.</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>A. Palmero</given-names>
            <surname>Aprosio</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Moretti</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Italy goes to Stanford: a collection of CoreNLP modules for Italian</article-title>
          . ArXiv e-prints,
          <year>September</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Francesco</given-names>
            <surname>Ronzano</surname>
          </string-name>
          , Francesco Barbieri, Endang Wahyu Pamungkas, Viviana Patti, and
          <string-name>
            <given-names>Francesca</given-names>
            <surname>Chiusaroli</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the EVALITA 2018 Italian Emoji Prediction (ITAMoji) Task</article-title>
          . In Tommaso Caselli, Nicole Novielli, Viviana Patti, and Paolo Rosso, editors,
          <source>Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA'18)</source>
          , Turin, Italy. CEUR.org.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>