<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the EVALITA 2016 Part Of Speech on TWitter for ITAlian Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Cristina Bosco</string-name>
          <email>bosco@di.unito.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Tamburini,</string-name>
          <email>fabio.tamburini@unibo.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Bolioli</string-name>
          <email>abolioli@celi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Mazzei</string-name>
          <email>mazzei@di.unito.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CELI</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dip. di Informatica, Universita` di Torino</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>FICLIT, University of Bologna</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>English. The increasing interest for the extraction of various forms of knowledge from micro-blogs and social media makes crucial the development of resources and tools that can be used for automatically deal with them. PoSTWITA contributes to the advancement of the state-of-the-art for Italian language by: (a) enriching the community with a previously not existing collection of data extracted from Twitter and annotated with grammatical categories, to be used as a benchmark for system evaluation; (b) supporting the adaptation of Part of Speech tagging systems to this particular text domain.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Italiano. La crescente rilevanza
dell’estrazione di varie forme di
conoscenza da testi derivanti da
microblog e social media rende cruciale lo
sviluppo di strumenti e risorse per il
trattamento automatico. PoSTWITA si
propone di contribuire all’avanzamento
dello stato dell’arte per la lingua
italiana in due modi: (a) fornendo alla
comunita` una collezione di dati estratti
da Twitter ed annotati con le categorie
grammaticali, risorsa precedentemente
non esistente, da utlizzare come banco
di prova nella valutazione di sistemi;
(b) promuovendo l’adattamento a questo
particolare dominio testuale dei sistemi di
Part of Speech tagging che partecipano al
task.</p>
      <p>Authors order has been decided by coin toss.</p>
    </sec>
    <sec id="sec-2">
      <title>1 Introduction and motivation</title>
      <p>In the past the effort on Part-of-Speech (PoS)
tagging has mainly focused on texts featured by
standard forms and syntax. However, in the last few
years the interest in automatic evaluation of social
media texts, in particular from microblogging such
as Twitter, has grown considerably: the so-called
user-generated contents have already been shown
to be useful for a variety of applications for
identifying trends and upcoming events in various fields.</p>
      <p>
        As social media texts are clearly different
from standardized texts, both regarding the
nature of lexical items and their distributional
properties (short messages, emoticons and mentions,
threaded messages, etc.), Natural Language
Processing methods need to be adapted for deal with
them obtaining reliable results in processing. The
basis for such an adaption are tagged social
media text corpora
        <xref ref-type="bibr" rid="ref12">(Neunerdt et al., 2013)</xref>
        for
training and testing automatic procedures. Even if
various attempts to produce such kind of
specialised resources and tools are described in
literature for other languages (e.g.
        <xref ref-type="bibr" rid="ref12 ref13 ref7 ref8">(Gimpel et al.,
2011; Derczynski et al., 2013; Neunerdt et al.,
2013; Owoputi et al., 2013)</xref>
        ), Italian currently
completely lacks of them both.
      </p>
      <p>For all the above mentioned reasons, we
proposed a task for EVALITA 2016 concerning the
domain adaptation of PoS-taggers to Twitter texts.
Participants to the evaluation campaign were
required to use the two following data sets provided
by the organization to set up their systems: the
first one, henceforth referred to as Development
Set (DS), contains data manually annotated using
a specific tagset (see section 2.2 for the tagset
description) and must be used to train participants
systems; the second one, referred to as Test Set
(TS), contains the test data in blind format for the
evaluation and has been given to participants in the
date scheduled for the evaluation.</p>
      <p>For better focusing the task on the challenges
related to PoS tagging, but also for avoiding the
boring problem of disappeared tweets, the distributed
version of tweets has been previously tokenised,
splitting each token on a different line.</p>
      <p>Moreover, according to an “open task”
perspective, participants were allowed to use other
resources with respect to those released for the
task, both for training and to enhance final
performances, as long as their results apply the proposed
tagsets.</p>
      <p>The paper is organized as follows. The next
section describes the data exploited in the task, the
annotation process and the issues related to the
tokenisation and tagging applied to the dataset. The
following section is instead devoted to the
description of the evaluation metrics and participants
results. Finally, we discuss the main issues involved
in PoSTWITA.
2</p>
    </sec>
    <sec id="sec-3">
      <title>Data Description</title>
      <p>
        For the corpus of the proposed task, we collected
tweets being part of the EVALITA2014
SENTIment POLarity Classification (SENTIPOLC)
        <xref ref-type="bibr" rid="ref2">(Basile et al., 2014)</xref>
        task dataset, benefitting of the
fact that it is cleaned from repetitions and other
possible sources of noise. The SENTIPOLC
corpus originates from a set of tweets (Twita)
randomly collected (Basile et al., 2013), and a set
of posts extracted exploiting specific keywords
and hashtags marking political topics (SentiTUT)
(Bosco et al., 2013).
      </p>
      <p>
        In order to work in a perspective of the
development of a benchmark where a full pipeline of
NLP tools can be applied and tested in the future,
the same selection of tweets has been exploited
in other EVALITA2016 tasks, in particular in the
EVALITA 2016 SENTiment POLarity
Classification Task (SENTIPOLC)
        <xref ref-type="bibr" rid="ref1">(Barbieri et al., 2016)</xref>
        ,
Named Entity rEcognition and Linking in Italian
Tweets (NEEL-IT)
        <xref ref-type="bibr" rid="ref1 ref3">(Basile et al., 2016)</xref>
        and Event
Factuality Annotation Task (FactA)
        <xref ref-type="bibr" rid="ref11">(Minard et
al., 2016)</xref>
        .
      </p>
      <p>Both the development and test set of
EVALITA2016 has been manually annotated
with PoS tags. The former, which has been
distributed as the DS for PoSTWITA, includes
6,438 tweets (114,967 tokens). The latter, that is
the TS, is instead composed by 300 tweets (4,759
tokens).</p>
      <p>
        The tokenisation and annotation of all data have
been first carried out by automatic tools, with a
high error rate which is motivated by the features
of the domain and text genre. We adapted the
Tweet-NLP tokeniser
        <xref ref-type="bibr" rid="ref8">(Gimpel et al., 2011)</xref>
        to
Italian for token segmentation and used the TnT
tagger
        <xref ref-type="bibr" rid="ref6">(Brants, 2000)</xref>
        trained on the Universal
Dependencies corpus (v1.3) for the first PoS-tagging step
(see also section 2.2).
      </p>
      <p>The necessary manual correction has been
applied by two different skilled humans working
independently on data. The versions produced by
them have been compared in order to detect
disagreements, conflicts or residual errors which have
been finally resolved by the contribution of a third
annotator.</p>
      <p>Nevertheless, assuming that the datasets of
PoSTWITA are developed from scratch for what
concerns the tokenisation and annotation of
grammatical categories, we expected the possible presence
of a few residual errors also after the above
described three phases of the annotation process.
Therefore, during the evaluation campaign, and
before the date scheduled for the evaluation, all
participants were invited and encouraged to
communicate to the organizers any errors found in the
DS. This allowed the organizers (but not the
participants) to update and redistribute it to the
participants in an enhanced form.</p>
      <p>No lexical resource has been distributed with
PoSTWITA 2016 data, since each participant is
allowed to use any available lexical resource or can
freely induce it from the training data.</p>
      <p>All the data are provided as plain text files in
UNIX format (thus attention must be paid to
newline character format), tokenised as described in
section 2.1, but only those of the DS have been
released with the adequate PoS tags described in
section 2.2. The TS contains only the tokenised
words but not the correct tags, that have to be
added by the participant systems to be
submitted for the evaluation. The correct tokenised and
tagged data of the TS (called gold standard TS),
exploited for the evaluation, has been provided to
the participants after the end of the contest,
together with their score.</p>
      <p>According to the treatment in the dataset from
where our data are extracted, each tweet in
PoSTWITA corpus is considered as a separate entity and
we did not preserved thread integrity, thus taggers
participating to the contest have to process each
tweet separately.
2.1</p>
      <sec id="sec-3-1">
        <title>Tokenisation Issues</title>
        <p>The problem of text segmentation (tokenisation) is
a central issue in PoS-tagger evaluation and
comparison. In principle, for practical applications,
every system should apply different tokenisation
rules leading to different outputs.</p>
        <p>We provided in the evaluation campaign all the
development and test data in tokenised format,
one token per line followed by its tag (when
applicable), following the schema:</p>
        <p>ID TWEET 1
&lt;TOKEN 1&gt; &lt;TAG1&gt;
&lt;TOKEN 2&gt; &lt;TAG2&gt;
&lt;TOKEN 3&gt; &lt;TAG3&gt;
&lt;TOKEN 4&gt; &lt;TAG4&gt;
&lt;TOKEN 5&gt; &lt;TAG5&gt;
&lt;TOKEN 6&gt; &lt;TAG6&gt;
&lt;TOKEN 7&gt; &lt;TAG7&gt;
&lt;TOKEN 8&gt; &lt;TAG8&gt;
&lt;TOKEN 9&gt; &lt;TAG9&gt;
&lt;TOKEN 10&gt; &lt;TAG10&gt;</p>
        <p>ID TWEET 2
&lt;TOKEN 1&gt; &lt;TAG1&gt;
&lt;TOKEN 2&gt; &lt;TAG2&gt;
&lt;TOKEN 3&gt; &lt;TAG3&gt;
&lt;TOKEN 4&gt; &lt;TAG4&gt;
&lt;TOKEN n&gt; &lt;TAGn&gt;</p>
        <p>162545185920778240
Governo PROPN
Monti PROPN
: PUNCT
decreto NOUN
in ADP
cdm PROPN
per ADP
approvazione NOUN
! PUNCT
http://t.co/Z76KLLGP URL</p>
        <p>192902763032743936
#Ferrara HASHTAG
critica VERB
#Grillo HASHTAG
perche´ SCONJ
...</p>
        <p>The first line for each tweet contains the Tweet
ID, while the line of each tweet after the last one is
empty, in order to separate each post from the
following. The example above shows some
tokenisation and formatting issues, in particular:
accents, which are coded using UTF-8
encoding table;
apostrophe, which is tokenised separately
only when used as quotation mark, not
when signalling a removed character (like in
dell’/orto)
All the other features of data annotation are
described in details in the following parts of this
section.</p>
        <p>For what concerns tokenisation and tagging
principles in EVALITA2016 PoSTWITA, we
decided to follow the strategy proposed in the
Universal Dependencies (UD) project for Italian1
applying only minor changes, which are motivated
by the special features of the domain addressed
in the task. This makes the
EVALITA2016PoSTWITA gold standard annotation compliant
1http://universaldependencies.org/it/
pos/index.html
with the other UD datasets, and strongly improves
the portability of our newly developed datasets
towards this standard.</p>
        <p>Assuming, as usual and more suitable in PoS
tagging, a neutral perspective with respect to the
solution of parsing problems (more relevant in
building treebanks), we differentiated our format from
that one applied in UD, by maintaining the word
unsplitted rather than splitted in different tokens,
also in the two following cases:
for the articulated prepositions (e.g. dalla
(from-the[fem]), nell´ (in-the[masc]), al
(tothe), ...)
for the clitic clusters, which can be attached
to the end of a verb form (e.g. regalaglielo
(gift-to-him-it), dandolo (giving-it), ...)
For this reason, we decided also to define two
novel specific tags to be assigned in these cases
(see section 1): ADP A and VERB CLIT
respectively for articulated prepositions and clitics,
according to the strategy assumed in previous
EVALITA PoS tagging evaluations.</p>
        <p>The participants are requested to return the test
file using exactly the same tokenisation format,
containing exactly the same number of tokens.
The comparison with the reference file will be
performed line-by-line, thus a misalignment will
produce wrong results.
2.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Tagset</title>
        <p>Beyond the introduction of the novel labels cited
above, motivated by tokenisation issues and
related to articulated prepositions and clitic
clusters, for what concerns PoS tagging labels,
further modifications with respect to UD standard
are instead motivated by the necessity of more
specific labels to represent particular phenomena
often occurring in social media texts. We
introduced therefore new Twitter-specific tags for
cases that following the UD specifications should
be all classified into the generic SYM (symbol)
class, namely emoticons, Internet addresses, email
addresses, hashtags and mentions (EMO, URL,
EMAIL, HASHTAG and MENTION). See Table
1 for a complete description of the PoSTWITA
tagset.</p>
        <p>We report in the following the more challenging
issues addressed in the development of our data
sets, i.e. the management of proper nouns and of
foreign words.</p>
        <p>UD
ADJ
ADP</p>
      </sec>
      <sec id="sec-3-3">
        <title>2.2.1 Proper Noun Management</title>
        <p>The annotation of named entities (NE) poses a
number of relevant problems in tokenisation and
PoS tagging. The most coherent way to handle
such kind of phenomena is to consider each NE
as a unique token assigning to it the PROPN tag.
Unfortunately this is not a viable solution for this
evaluation task, and, moreover, a lot of useful
generalisation on n-gram sequences (e.g.
Ministero/dell/Interno PROPN/ADP A/PROPN) would
be lost if adopting such kind of solution. Anyway,
the annotation of sequences like Banca Popolare
and Presidente della Repubblica Italiana deserve
some attention and a clear policy.</p>
        <p>Following the approach applied in Evalita 2007 for
the PoS tagging task, we annotate as PROPN those
words of the NE which are marked by the
uppercase letter, like in the following examples:</p>
        <p>Nevertheless, in some other cases, the
uppercase letter has not been considered enough to
determine the introduction of a PROPN tag:
“...anche nei Paesi dove..., “...in contraddizione
con lo Stato sociale...”.</p>
        <p>This strategy is devoted to produce a data set that
incorporates the speakers linguistic intuition about
this kind of structures, regardless of the
possibility of formalization of the involved knowledge in
automatic processing.</p>
      </sec>
      <sec id="sec-3-4">
        <title>2.2.2 Foreign words</title>
        <p>Non-Italian words are annotated, when possible,
following the same PoS tagging criteria adopted in
UD guidelines for the referring language. For
instance, good-bye is marked as an interjection with
the label INTJ.
3</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation Metrics</title>
      <p>The evaluation is performed in a black box
approach: only the systems output is evaluated. The
evaluation metric will be based on a
token-by</p>
      <sec id="sec-4-1">
        <title>ILABS ILC-CNR</title>
      </sec>
      <sec id="sec-4-2">
        <title>MIVOQ NITMZ</title>
        <sec id="sec-4-2-1">
          <title>Team</title>
          <p>E.W. Stemle</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>C. Aliprandi, L De Mattei A. Cimino, F. Dell’Orletta</title>
      </sec>
      <sec id="sec-4-4">
        <title>Giulio Paci P. Pakray, G. Majumder</title>
      </sec>
      <sec id="sec-4-5">
        <title>UniBologna UniDuisburg F. Tamburini T. Horsmann, T. Zesch</title>
      </sec>
      <sec id="sec-4-6">
        <title>UniGroningen UniPisa B. Plank, M. Nissim G. Attardi, M. Simi</title>
        <sec id="sec-4-6-1">
          <title>Affiliations</title>
          <p>Inst. for Specialised Commun. and Multilingualism,
EURAC Research, Bolzano/Bozen, Italy
Integris Srl, Roma, Italy
Istituto di Linguistica Computazionale Antonio Zampolli
CNR, Pisa, Italy
Mivoq Srl, Padova, Italy
Deptt. of Computer Science &amp; Engg., Nat. Inst. of Tech.,
Mizoram,Aizawl, India
FICLIT, University of Bologna, Italy
Language Technology Lab Dept. of Comp. Science and
Appl. Cog. Science, Univ. of Duisburg-Essen, Germany
University of Groningen, The Nederlands
Dipartimento di Informatica, Universit di Pisa, Italy
token comparison and only a single tag is allowed
for each token. The considered metric is the
Tagging accuracy: it is defined as the number of
correct PoS tag assignment divided by the total
number of tokens in TS.
4</p>
          <p>Teams and Results
16 teams registered for this task, but only 9
submitted a final run for the evaluation. Table 2
outlines participants’ main data: 7 participant teams
belong to universities or other research centres and
the last 2 represent private companies working in
the NLP and speech processing fields.</p>
          <p>Table 3 describes the main features of the
evaluated systems w.r.t. the core methods and the
additional resources employed to develop the
presented system.</p>
          <p>In the Table 4 we report the final results of the
PoSTWITA task of the EVALITA2016 evaluation
campaign. In the submission of the result, we
allow to submit a single “official” result and,
optionally, one “unofficial” result (“UnOFF” in the
table): UniBologna, UniGroningen, UnPisa and
UniDuisburg decided to submit one more
unofficial result. The best result has been achieved
by the ILC-CNR group (93.19% corresponding to
4; 435 correct tokens over 4; 759).
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Discussion and Conclusions</title>
      <p>
        Looking at the results we can draw some
provisional conclusions about the PoS-tagging of
Italian tweets:
as expected, the performances of the
automatic PoS-taggers when annotating tweets
are lower than when working on normal texts,
but are in line with the state-of-the art for
other languages;
all the top-performing systems are based
on Deep Neural Networks and, in
particular, on Long Short-Term Memories (LSTM)
        <xref ref-type="bibr" rid="ref10 ref10">(Hochreiter, Schmidhuber, 1997; Graves,
Schmidhuber, 1997)</xref>
        ;
most systems use word or character
embeddings as inputs for their systems;
more or less all the presented systems make
use of additional resources or knowledge
(morphological analyser, additional tagged
corpora and/or large non-annotated twitter
corpora).
      </p>
      <p>Looking at the official results, and comparing
them with the experiments that the participants
devised to set up their own system (not reported
here, please look at the participants’ reports), it
is possible to note the large difference in
performances. During the setup phase most systems,
among the top-performing ones, obtained
coherent results well above 95/96% of accuracy on the
development set (either splitting it into a
training/validation pair or by making cross-validation
tests), while the best performing system in the
official evaluation exhibit performances slightly
above 93%. It is a huge difference for this kind
of task, rarely observed in literature.</p>
      <p>One possible reason that could explain this
difference in performances regards the kind of
docu</p>
      <sec id="sec-5-1">
        <title>ILABS</title>
        <p>ILC-CNR</p>
      </sec>
      <sec id="sec-5-2">
        <title>MIVOQ</title>
      </sec>
      <sec id="sec-5-3">
        <title>NITMZ UniBologna</title>
      </sec>
      <sec id="sec-5-4">
        <title>UniDuisburg UniGroningen</title>
      </sec>
      <sec id="sec-5-5">
        <title>UniPisa</title>
        <sec id="sec-5-5-1">
          <title>Core methods</title>
          <p>LSTM NN
(word&amp;char embeddings)
Perceptron algorithm
two-branch BiLSTM NN
(word&amp;char embeddings)
Tagger combination based on Yamcha</p>
        </sec>
      </sec>
      <sec id="sec-5-6">
        <title>HMM bigram model Stacked BiLSTM NN + CRF (augmented word embeddings) CRF classifier</title>
        <p>BiLSTM NN
(word embedding)
BiLSTM NN + CRF
(word&amp;char embeddings)</p>
        <sec id="sec-5-6-1">
          <title>Resources (other than DS)</title>
          <p>DiDi-IT
word features extracted from proprietary
resources and 250k entries of wikitionary.
Morhological Analyser (65,500 lemmas) +
ItWaK corpus
Evalita2009 Pos-tagged data
ISTC pronunciation dictionary
Morhological Analyser (110,000 lemmas) +
200Mw twitter corpus
400Mw Twitter corpus
Universal Dependencies v1.3
74 kw tagged Facebook corpus
423Kw tagged Mixed corpus
141Mw Twitter corpus
ments in the test set. We inherited the development
set from the SENTIPOLC task at EVALITA2014
and the test set from SENTIPOLC2016 and,
maybe, the two corpora, developed in different
epochs and using different criteria, could contain
also different kind of documents. Differences in
the lexicon, genre, etc. could have affected the
training phase of taggers leading to lower results
in the evaluation phase.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Barbieri</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Croce</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nissim</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Novielli</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <article-title>Overview of the EVALITA 2016 SENTiment POLarity Classification Task</article-title>
          . author=,
          <source>In Proceedings of Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          ) &amp;
          <article-title>Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Basile</surname>
          </string-name>
          , v.,
          <string-name>
            <surname>Bolioli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nissim</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <year>2014</year>
          .
          <article-title>Overview of the Evalita 2014 SENTIment POLarity Classification</article-title>
          Task
          <source>In Proceedings of Evalita</source>
          <year>2014</year>
          ,
          <fpage>50</fpage>
          -
          <lpage>57</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caputo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gentile</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rizzo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <article-title>Overview of the EVALITA 2016 Named Entity rEcognition and Linking in Italian Tweets (NEELIT) Task</article-title>
          .
          <source>In Proceedings of Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          ) &amp;
          <article-title>Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Basile</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nissim</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Sentiment analysis on Italian tweets</article-title>
          .
          <source>In Proceedings of the 4th Workshop on Computational Approaches</source>
          to Subjectivity,
          <article-title>Sentiment and Social Media Analysis</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Bosco</surname>
            ,
            <given-names>c.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patti</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bolioli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>Developing Corpora for Sentiment Analysis: The Case of Irony and SentiTUT. IEEE Intelligent Systems, special issue on Knowledge-based approaches to content-level sentiment analysis</article-title>
          .
          <source>Vol 28 num 2.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Brants</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <year>2000</year>
          .
          <article-title>TnT - A Statistical Part-of-Speech Tagger</article-title>
          .
          <source>In Proceedings of the 6th Applied Natural Language Processing Conference.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Derczynski</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ritter</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bontcheva</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>Twitter Part-of-Speech Tagging for All: Overcoming Sparse</article-title>
          and
          <source>Noisy Data In Proceedings of RANLP</source>
          <year>2013</year>
          ,
          <volume>198</volume>
          -
          <fpage>206</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Gimpel</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>O'Connor</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mills</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eisenstein</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heilman</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yogatama</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flanigan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N.A.</given-names>
          </string-name>
          <year>2011</year>
          .
          <article-title>Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments</article-title>
          .
          <source>In Proceedings of ACL</source>
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Graves</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>2005</year>
          .
          <article-title>Framewise phoneme classification with bidirectional lstm and other neural network architectures</article-title>
          .
          <source>Neural Networks</source>
          ,
          <volume>18</volume>
          (
          <issue>5-6</issue>
          ),
          <fpage>602</fpage>
          -
          <lpage>610</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <year>1997</year>
          .
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural Computation</source>
          ,
          <volume>9</volume>
          (
          <issue>8</issue>
          ),
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Minard</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Speranza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Caselli</surname>
            ,
            <given-names>T. 2016</given-names>
          </string-name>
          <article-title>The EVALITA 2016 Event Factuality Annotation Task (FactA)</article-title>
          .
          <source>In Proceedings of Third Italian Conference on Computational Linguistics</source>
          (CLiC-it
          <year>2016</year>
          ) &amp;
          <article-title>Fifth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian</article-title>
          .
          <source>Final Workshop (EVALITA</source>
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Neunerdt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trevisan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reyer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mathar</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>Part-of-speech tagging for social media texts</article-title>
          .
          <source>Language Processing and Knowledge in the Web</source>
          . Springer,
          <fpage>139</fpage>
          -
          <lpage>150</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Owoputi</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>OConnor</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gimpel</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N.A.</given-names>
          </string-name>
          <year>2013</year>
          .
          <article-title>Improved Part-ofSpeech Tagging for Online Conversational Text with Word Clusters</article-title>
          .
          <source>In Proceedings of NAACL</source>
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>