<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>(Better than) State-of-the-Art PoS-tagging for Italian Texts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fabio Tamburini</string-name>
          <email>fabio.tamburini@unibo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>FICLIT - University of Bologna</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>English. This paper presents some experiments for the construction of an highperformance PoS-tagger for Italian using deep neural networks techniques (DNN) integrated with an Italian powerful morphological analyser. The results obtained by the proposed system on standard datasets taken from the EVALITA campaigns show large accuracy improvements when compared with previous systems from the literature.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        In recent years there were a large number of works
trying to push the accuracy of the PoS-tagging
task forward using new techniques, mainly from
the deep learning domain
        <xref ref-type="bibr" rid="ref19 ref3 ref4 ref5 ref8">(Collobert et al., 2011;
Søgaard, 2011; dos Santos and Zadrozny, 2014;
Huang et al., 2015; Wang et al., 2015; Chiu and
Nichols, 2016)</xref>
        .
      </p>
      <p>
        All these studies are mainly devoted to show
how to find the best combination of new
neural network structures and character/word
embeddings for reaching the highest classification
performances, and typically present solutions that do
not make any use of specific language resources
(e.g. morphological analysers, gazetteers,
guessing procedures for unknown words, etc.). This is,
in general, a very desirable feature because it
allows for the production of tools not tied to any
specific language, but in various evaluation
campaigns, at least for highly-inflected languages as
Italian, the results showed quite clearly that this
task would benefit from the use of specific and rich
language resources
        <xref ref-type="bibr" rid="ref1 ref17 ref18">(Tamburini, 2007; Attardi and
Simi, 2009)</xref>
        .
      </p>
      <p>
        In this study, still work-in-progress, we set-up
a PoS-tagger for Italian able to gather the highest
classification performances by using any available
language resource and the most up-to-date DNN.
We used AnIta
        <xref ref-type="bibr" rid="ref16">(Tamburini and Melandri, 2012)</xref>
        ,
one of the most powerful morphological analysers
for Italian, based on a wide lexicon (about 110.000
lemmas), for providing the PoS-tagger with a large
set of useful information.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2 Input features</title>
      <p>The set of input features for each token is basically
formed by two different components: the word
embedding and some morphological information.
2.1</p>
      <sec id="sec-2-1">
        <title>Word Embeddings</title>
        <p>
          All the embeddings used in our experiments were
extracted from the CORIS corpus
          <xref ref-type="bibr" rid="ref13">(Rossini Favretti
et al., 2002)</xref>
          , a 130Mw synchronic reference
corpus for Italian, by using the tool word2vec1
          <xref ref-type="bibr" rid="ref11">(Mikolov et al., 2013)</xref>
          . We added two special
tokens to mark the sentence beginning ‘&lt;s&gt;’ and
ending ‘&lt;/s&gt;’.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Morphological features</title>
        <p>One of the most useful kind of information that
increases the performances of PoS-taggers concerns
the list of all possible tags for a single word-form.
Having a restricted list of possibility enable the
tagger to reduce the search space and force it to
take reasonable decisions. The results obtained
1https://code.google.com/archive/p/word2vec/
in past PoS-taggers evaluations on Italian agree
in suggesting that powerful morphological
analysers based on large lexica are invaluable resources
to increase tagger accuracy. For these reasons,
we extended the word embeddings computed in
a completely unsupervised way by concatenating
to them a vector containing the possible PoS-tags
provided by the AnIta analyser. This tool is also
able to identify, through the use of simple regular
expressions, numbers, dates, URLs, emails, etc.,
and assign them the proper tag(s).
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Unknown words handling and Sentence padding</title>
        <p>The source of most tagging errors is certainly the
presence of the so called ‘unknown words’,
wordforms for which the tagger did not receive any
information during the training phase. A
morphological analyser based on a large lexicon could
certainly alleviate this problem providing information
also for word-forms not belonging to the training
set, but there are large classes of tokens that cannot
be successfully handled by the analyser, for
example proper names, foreign words, etc.</p>
        <p>
          In a previous work
          <xref ref-type="bibr" rid="ref17 ref18">(Tamburini, 2007b)</xref>
          we
showed that using such a powerful morphological
analyser, the word-forms not covered by it in real
texts belongs at 95% to the class of proper names,
adjectives and common nouns and a simple
heuristic correctly assigns most of the cases. In this
way AnIta always provides one or more PoS-tag
hypothesis for each word-form that can be
transformed into a binary vector with 1s in
correspondence of possible PoS-tags and 0s otherwise, but
if the word-form did not have a computed
embedding, the first part of the input features would not
be defined. For solving such problem, instead of
using the common solution of assigning a random
vector to all unknown words, we averaged all the
embeddings of the other word presenting exactly
the same combination of possible PoS-tags.
        </p>
        <p>It is also a common practice to pad sentences,
at the beginning and at the end, using random
vectors, but we, instead, used the real
embeddings computed for the special tokens ‘&lt;s&gt;’ and
‘&lt;/s&gt;’, added for this purpose, with the
respective tag ‘BoS’ and ‘EoS’. Due to the internal
structuring of the used tensor manipulating application
(see later), we were forced to add also an
out-ofsentence vector to pad sentences to their maximal
length, and the correspondent tag OoS.
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Data structuring</title>
        <p>We experimented two different ways of structuring
the input features for processing:</p>
        <p>Win: this mode of organising input data is
based on a sliding window that starts from the
beginning of each sentence and concatenates
word feature vectors into one single vector.
Padding is inserted at sentence borders.</p>
        <p>Seq: each sentence is managed as one single
sequence padded at the borders.</p>
        <p>Each network experimented in this study uses
one of these two data structuring type.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 (Deep) Learning Blocks</title>
      <p>All the experiments presented in this paper has
been performed using Keras2 a “a minimalist,
highly modular neural networks library, written in
Python and capable of running on top of either
TensorFlow or Theano”, two widely used tensor
manipulation libraries. Keras provides some basic
neural network blocks as well as different
learning procedures for the desired network
configuration and simple tools for writing new blocks. In
our experiments we used some of them, namely
multilayer-perceptrons (MLP) and Long
ShortTerm Memory (LSTM), and we wrote a new block
to handle Conditional Random Fields (CRF).</p>
      <p>MLP are simple feedforward neural networks
with one or more fully-connected hidden layers.
We obtained maximum performances using only
one hidden layer.</p>
      <p>
        LSTM networks
        <xref ref-type="bibr" rid="ref6 ref7">(Hochreiter and Schmidhuber,
1997; Graves and Schmidhuber, 2005)</xref>
        are a kind
of recurrent neural network which received a lot
of attention in recent years due to their ability of
produce good classification results for sequence
problems. Their property of preventing the
vanishing (and exploding) gradient problem that affects
standard recurrent neural networks made them the
default choice for solving sequence classification
problems inside the DNN framework. Usually
this kind of units are arranged to form a
bidirectional chain (BiLSTM) for gathering information
both from the past and from the future of the
input data sequence, a very desirable issue for such
kind of classification problems. In all our
experiments using BiLSTM we obtained maximum
performances by stacking two layers of them, with
2https://github.com/fchollet/keras/tree/master/keras
a dropout layer after each of them
        <xref ref-type="bibr" rid="ref15">(Srivastava et
al., 2014)</xref>
        , and a final dense softmax layer, or a
time-distributed-dense softmax layer, feeded by
the BiLSTM output.
      </p>
      <p>
        Linear CRFs are the simpler Probabilistic
Graphical Model (PGM) and it has been
successfully used in NLP for sequence classification
problems
        <xref ref-type="bibr" rid="ref10">(Lafferty et al., 2001)</xref>
        . We did some
experiments stacking them after the softmax layer.
      </p>
      <p>Figure 1 shows the most complex DNN
structure used in out experiments.
All the experiments presented in this paper to
test the effectiveness of the proposed system
refer to two evaluation campaigns organised inside
the EVALITA3 framework. In particular, in 2007
and 2009 were organised specific task to test
Italian PoS-taggers performances.
4.1</p>
      <sec id="sec-3-1">
        <title>The EVALITA 2007 evaluation</title>
        <p>Two separate data sets were provided: the
Development Set (DS), composed of 133,756 tokens,
was used for system development and for the
training phase, while a Test Set (TS), composed of
17,313 tokens, was used as a reference for
systems evaluation. Both contain various documents
belonging mainly to journalistic and narrative
genres, with small sections containing academic and
legal/administrative prose. Each participant was
allowed to use any available resource or could
freely induce it from the training data.</p>
        <p>The original PoS-tagging task involved two
different tagsets, but our experiments used only the
tags and the annotation named ‘EAGLES-like’.</p>
        <p>
          The evaluation metrics were based on a
tokenby-token comparison and only one tag was
allowed for each token. The EVALITA metric
considered in this study is the Tagging Accuracy,
defined as the number of correct PoS-tag
assignments divided by the total number of tokens in the
TS. See
          <xref ref-type="bibr" rid="ref17 ref18">(Tamburini, 2007)</xref>
          for further details.
4.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>The EVALITA 2009 evaluation</title>
        <p>The DS consisted in 113895 word forms (already
divided in a training set - 108,874 tokens - and a
validation set - 5021 tokens). The TS consisted of
5066 word forms. The training set is formed by
newspaper articles from ‘La Repubblica’, while
the validation and test set contain documents
extracted from the Italian Wikipedia. This test the
degree of system adaptation to new domains.</p>
        <p>
          The organisers evaluated the results using a
coarse grained (37 tags) and a morphed (336 tags)
tagsets inserted in a closed/open task framework,
but in this study all the results refer to the open
task (one can use external resources) on the coarse
grained tagset. The evaluation metric is the same
described before in section 4.1. See
          <xref ref-type="bibr" rid="ref1">(Attardi and
Simi, 2009)</xref>
          for further details.
4.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Hyper-Parameters</title>
        <p>Considering the large number of hyper-parameters
involved in the whole procedure, we did not test all
the possible combinations; we used, instead, the
most common set-up of parameters gathered from
the literature. Table 1 outlines the whole set-up for
the unmodified hyper-parameters.</p>
        <p>word2vec Embed.</p>
        <p>Hyperpar. Value
type SkipGr.
size 100
(1/2) win. 5
neg. sampl. 25
sample 1e-4
iter 15</p>
      </sec>
      <sec id="sec-3-4">
        <title>Feature extraction</title>
      </sec>
      <sec id="sec-3-5">
        <title>Hyperpar. Value</title>
        <p>window 5</p>
      </sec>
      <sec id="sec-3-6">
        <title>Learning Params.</title>
        <p>
          batch (win) 1/4*NU
batch (seq) 1
Opt. Alg. Adam
Loss Func. Categ.CE
There are some interesting studies
          <xref ref-type="bibr" rid="ref12 ref2">(Bengio, 2012;
Prechelt, 2012)</xref>
          dealing with the problem of
stopping the learning process at the right point; this
issue is known as the ‘early stopping’ problem.
Choosing the correct epoch to stop the learning
process helps avoiding overfitting on the training
set and usually produces systems exhibiting
better generalisations. But, how to choose the correct
epoch is not simple. The suggestion given in
various studies on this topic is to consider a validation
set and stop the learning process when the
performances on this set do not increase anymore or even
decrease, a clear hint of overfitting.
        </p>
        <p>The usual way to set up an experiment
following this suggestions involves splitting the gold
standard into three different instance sets: the
training set, for training, the validation set, to
determine the stopping point, and the test set to
evaluate the system. However, we are testing our
systems on real evaluation data that has been already
split by the organisers into development and test
set. Thus, we can divide the development set into
training/validation set for optimising the
hyperparameters and define the stopping epoch, but, for
the final evaluation, we would like to train the final
system on the complete development set to adhere
to the evaluation constraints and to benefit from
using more training data.</p>
        <p>Having two different training procedures for the
optimisation and evaluation phases leads to a more
complex procedure for determining the stopping
epoch. Moreover, the typical accuracy profile for
DNN systems is not smooth and oscillate
heavily during training. To avoid any problem in
determining the stopping point we smoothed all the
profiles using a bezier spline. The procedure we
adopted to determine the stopping epoch is (please
look at Fig. 2): (1) find the first maximum in the
validation smoothed profile - A; (2) find the
corresponding value of accuracy on the smoothed
training profile - B; (3) find the point in the smoothed
development set profile having the same accuracy
as in B - C; (4) select the epoch corresponding at
point C as the stopping epoch - D.
4.5</p>
      </sec>
      <sec id="sec-3-7">
        <title>Results</title>
        <p>Table 2 outlines the systems’ accuracies for
different configurations for both datasets. We can
observe that by using AnIta morphological
information, as well as all the techniques described
in section 2.3, improves the systems’ results by
more than 1%. Considering the data structuring
described in section 2.4, the management of an
entire sentence as a complete sequence allows
recurrent configurations to work with larger contexts
producing better results. Adding a CRF layer after
the BiLSTM seems to slightly improve the
performances, but not in a significant way.</p>
      </sec>
      <sec id="sec-3-8">
        <title>SYSTEM</title>
        <p>MLP-256
MLP-256
2-BiLSTM-256
2-BiLSTM-256
2-BiLSTM-256-CRF</p>
        <p>In Table 3 we can see our best system
performances, namely AnIta-BiLSTM-CRF, compared
with the three best systems of the considered
EVALITA campaigns. As you can see, in both
cases the proposed system ranked first improving
the scoring by large quantities.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>The proposed system for PoS-tagging,
integrating DNNs and a powerful morphological analyser,
exhibited very good accuracy results when
applied to standard Italian evaluation datasets from
the EVALITA campaigns. The information from
AnIta proved to be crucial to reach such accuracy
values as well as stacked BiLSTM networks
processing entire sentence sequences.</p>
      <p>We have to further test different DNN
configurations and their integration with other kind of
PGMs as well as make more experiments with
different hyperparameters.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Giuseppe</given-names>
            <surname>Attardi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Maria</given-names>
            <surname>Simi</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Overview of the EVALITA 2009 Part-of-Speech Tagging Task</article-title>
          .
          <source>In Proc. of Workshop Evalita</source>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Practical Recommendations for Gradient-Based Training of Deep Architectures</article-title>
          . In Gre´goire Montavon, Genevie`ve
          <string-name>
            <given-names>B.</given-names>
            <surname>Orr</surname>
          </string-name>
          , and KlausRobert Mu¨ller, editors,
          <source>Neural Networks: Tricks of the Trade: Second Edition</source>
          , pages
          <fpage>437</fpage>
          -
          <lpage>478</lpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Jason</given-names>
            <surname>Chiu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Eric</given-names>
            <surname>Nichols</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Sequential Labeling with Bidirectional LSTM-CNNs</article-title>
          .
          <source>In Proc. International Conf. of Japanese Association for NLP</source>
          , pages
          <fpage>937</fpage>
          -
          <lpage>940</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Ronan</given-names>
            <surname>Collobert</surname>
          </string-name>
          , Jason Weston, Le´on Bottou, Michael Karlen, Koray Kavukcuoglu, and
          <string-name>
            <given-names>Pavel</given-names>
            <surname>Kuksa</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Natural language processing (almost) from scratch</article-title>
          .
          <source>J. Mach. Learn. Res.</source>
          ,
          <volume>12</volume>
          :
          <fpage>2493</fpage>
          -
          <lpage>2537</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>Cicero dos Santos and Bianca Zadrozny</source>
          .
          <year>2014</year>
          .
          <article-title>Learning character-level representations for partof-speech tagging</article-title>
          .
          <source>In Proc. of the 31st International Conference on Machine Learning, JMLR</source>
          , volume
          <volume>32</volume>
          . JMLR
          <string-name>
            <surname>W</surname>
          </string-name>
          &amp;CP.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Alex</given-names>
            <surname>Graves</surname>
          </string-name>
          and Ju¨rgen Schmidhuber.
          <year>2005</year>
          .
          <article-title>Framewise phoneme classification with bidirectional lstm and other neural network architectures</article-title>
          .
          <source>Neural Networks</source>
          ,
          <volume>18</volume>
          (
          <issue>5-6</issue>
          ):
          <fpage>602</fpage>
          -
          <lpage>610</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Sepp</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          and Ju¨rgen Schmidhuber.
          <year>1997</year>
          .
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural Computation</source>
          ,
          <volume>9</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Zhiheng</given-names>
            <surname>Huang</surname>
          </string-name>
          , Wei Xu,
          <string-name>
            <given-names>and Kai</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Bidirectional LSTM-CRF Models for Sequence Tagging</article-title>
          . ArXiv e-prints,
          <volume>1508</volume>
          .
          <year>01991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>D.P.</given-names>
            <surname>Kingma</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.L.</given-names>
            <surname>Ba</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Adam: a method for stochastic optimization</article-title>
          .
          <source>In Proc. International Conference on Learning Representations - ICLR.</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>J.</given-names>
            <surname>Lafferty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>McCallum</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Pereira</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Conditional random fields: Probabilistic models for segmenting and labeling sequence data</article-title>
          .
          <source>In Proc. 18th International Conf. on Machine Learning</source>
          , pages
          <fpage>282</fpage>
          -
          <lpage>289</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , Kai Chen, Greg Corrado, and
          <string-name>
            <given-names>Jeffrey</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Efficient Estimation of Word Representations in Vector Space</article-title>
          .
          <source>In Proc. of Workshop</source>
          at ICLR.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Lutz</given-names>
            <surname>Prechelt</surname>
          </string-name>
          .
          <year>2012</year>
          . Early Stopping - But When? In Gre´goire Montavon, Genevie`ve
          <string-name>
            <given-names>B.</given-names>
            <surname>Orr</surname>
          </string-name>
          , and KlausRobert Mu¨ller, editors,
          <source>Neural Networks: Tricks of the Trade: Second Edition</source>
          , pages
          <fpage>53</fpage>
          -
          <lpage>67</lpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Rema</given-names>
            <surname>Rossini</surname>
          </string-name>
          <string-name>
            <surname>Favretti</surname>
          </string-name>
          , Fabio Tamburini, and Cristiana De Santis.
          <year>2002</year>
          .
          <article-title>CORIS/CODIS: A corpus of written Italian based on a defined and a dynamic model</article-title>
          . In Andrew Wilson, Paul Rayson, and Tony McEnery, editors,
          <source>A Rainbow of Corpora: Corpus Linguistics and the Languages of the World</source>
          , pages
          <fpage>27</fpage>
          -
          <lpage>38</lpage>
          . Lincom-Europa, Munich.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Anders</given-names>
            <surname>Søgaard</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Semi-supervised condensed nearest neighbor for part-of-speech tagging</article-title>
          .
          <source>In Proc. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies</source>
          , pages
          <fpage>48</fpage>
          -
          <lpage>52</lpage>
          , Portland, Oregon, USA.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Nitish</given-names>
            <surname>Srivastava</surname>
          </string-name>
          , Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and
          <string-name>
            <given-names>Ruslan</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Dropout: A simple way to prevent neural networks from overfitting</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          ,
          <volume>15</volume>
          :
          <fpage>1929</fpage>
          -
          <lpage>1958</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Fabio</given-names>
            <surname>Tamburini</surname>
          </string-name>
          and
          <string-name>
            <given-names>Matias</given-names>
            <surname>Melandri</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>AnIta: a powerful morphological analyser for Italian</article-title>
          .
          <source>In Proc. 8th International Conference on Language Resources and Evaluation - LREC</source>
          <year>2012</year>
          , pages
          <fpage>941</fpage>
          -
          <lpage>947</lpage>
          , Istanbul.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Fabio</given-names>
            <surname>Tamburini</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>EVALITA 2007: the Partof-Speech Tagging Task</article-title>
          .
          <source>Intelligenza Artificiale</source>
          ,
          <source>IV(2)</source>
          :
          <fpage>4</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Fabio</given-names>
            <surname>Tamburini</surname>
          </string-name>
          .
          <year>2007b</year>
          .
          <article-title>CORISTagger: a highperformance PoS tagger for Italian. Intelligenza Artificiale. Intelligenza Artificiale, IV(2</article-title>
          ):
          <fpage>14</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Peilu</given-names>
            <surname>Wang</surname>
          </string-name>
          , Yao Qian, Frank. K Soong,
          <string-name>
            <surname>Lei He</surname>
            , and
            <given-names>Hai</given-names>
          </string-name>
          <string-name>
            <surname>Zhao</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding</article-title>
          . ArXiv e-prints,
          <volume>1511</volume>
          .
          <fpage>00215</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>