<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">(Better than) State-of-the-Art PoS-tagging for Italian Texts</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Fabio</forename><surname>Tamburini</surname></persName>
							<email>fabio.tamburini@unibo.it</email>
							<affiliation key="aff0">
								<orgName type="institution">FICLIT -University of Bologna</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">(Better than) State-of-the-Art PoS-tagging for Italian Texts</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">531FA426A5CE903CA86E684FA0AE3531</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T23:47+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract xml:lang="it">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>English. This paper presents some experiments for the construction of an highperformance PoS-tagger for Italian using deep neural networks techniques (DNN) integrated with an Italian powerful morphological analyser.</p><p>The results obtained by the proposed system on standard datasets taken from the EVALITA campaigns show large accuracy improvements when compared with previous systems from the literature.</p><p>Italiano. Questo contributo presenta alcuni esperimenti per la costruzione di un PoS-tagger ad alte prestazioni per l'italiano utilizzando reti neurali 'deep' integrate con un potente analizzatore morfologico. I risultati ottenuti sui dataset delle campagne EVALITA da parte del sistema proposto mostrano incrementi di accuratezza piuttosto rilevanti in confronto ai precedenti sistemi in letteratura.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In recent years there were a large number of works trying to push the accuracy of the PoS-tagging task forward using new techniques, mainly from the deep learning domain <ref type="bibr" target="#b3">(Collobert et al., 2011;</ref><ref type="bibr" target="#b12">Søgaard, 2011;</ref><ref type="bibr" target="#b4">dos Santos and Zadrozny, 2014;</ref><ref type="bibr" target="#b6">Huang et al., 2015;</ref><ref type="bibr" target="#b17">Wang et al., 2015;</ref><ref type="bibr" target="#b2">Chiu and Nichols, 2016)</ref>.</p><p>All these studies are mainly devoted to show how to find the best combination of new neural network structures and character/word embeddings for reaching the highest classification performances, and typically present solutions that do not make any use of specific language resources (e.g. morphological analysers, gazetteers, guessing procedures for unknown words, etc.). This is, in general, a very desirable feature because it allows for the production of tools not tied to any specific language, but in various evaluation campaigns, at least for highly-inflected languages as Italian, the results showed quite clearly that this task would benefit from the use of specific and rich language resources <ref type="bibr" target="#b15">(Tamburini, 2007;</ref><ref type="bibr" target="#b0">Attardi and Simi, 2009)</ref>.</p><p>In this study, still work-in-progress, we set-up a PoS-tagger for Italian able to gather the highest classification performances by using any available language resource and the most up-to-date DNN. We used AnIta <ref type="bibr" target="#b14">(Tamburini and Melandri, 2012)</ref>, one of the most powerful morphological analysers for Italian, based on a wide lexicon (about 110.000 lemmas), for providing the PoS-tagger with a large set of useful information.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Input features</head><p>The set of input features for each token is basically formed by two different components: the word embedding and some morphological information.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Word Embeddings</head><p>All the embeddings used in our experiments were extracted from the CORIS corpus <ref type="bibr" target="#b11">(Rossini Favretti et al., 2002)</ref>, a 130Mw synchronic reference corpus for Italian, by using the tool word2vec<ref type="foot" target="#foot_0">1</ref>  <ref type="bibr" target="#b9">(Mikolov et al., 2013)</ref>. We added two special tokens to mark the sentence beginning '&lt;s&gt;' and ending '&lt;/s&gt;'.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Morphological features</head><p>One of the most useful kind of information that increases the performances of PoS-taggers concerns the list of all possible tags for a single word-form. Having a restricted list of possibility enable the tagger to reduce the search space and force it to take reasonable decisions. The results obtained in past PoS-taggers evaluations on Italian agree in suggesting that powerful morphological analysers based on large lexica are invaluable resources to increase tagger accuracy. For these reasons, we extended the word embeddings computed in a completely unsupervised way by concatenating to them a vector containing the possible PoS-tags provided by the AnIta analyser. This tool is also able to identify, through the use of simple regular expressions, numbers, dates, URLs, emails, etc., and assign them the proper tag(s).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Unknown words handling and Sentence padding</head><p>The source of most tagging errors is certainly the presence of the so called 'unknown words', wordforms for which the tagger did not receive any information during the training phase. A morphological analyser based on a large lexicon could certainly alleviate this problem providing information also for word-forms not belonging to the training set, but there are large classes of tokens that cannot be successfully handled by the analyser, for example proper names, foreign words, etc. In a previous work <ref type="bibr" target="#b16">(Tamburini, 2007b)</ref> we showed that using such a powerful morphological analyser, the word-forms not covered by it in real texts belongs at 95% to the class of proper names, adjectives and common nouns and a simple heuristic correctly assigns most of the cases. In this way AnIta always provides one or more PoS-tag hypothesis for each word-form that can be transformed into a binary vector with 1s in correspondence of possible PoS-tags and 0s otherwise, but if the word-form did not have a computed embedding, the first part of the input features would not be defined. For solving such problem, instead of using the common solution of assigning a random vector to all unknown words, we averaged all the embeddings of the other word presenting exactly the same combination of possible PoS-tags.</p><p>It is also a common practice to pad sentences, at the beginning and at the end, using random vectors, but we, instead, used the real embeddings computed for the special tokens '&lt;s&gt;' and '&lt;/s&gt;', added for this purpose, with the respective tag 'BoS' and 'EoS'. Due to the internal structuring of the used tensor manipulating application (see later), we were forced to add also an out-ofsentence vector to pad sentences to their maximal length, and the correspondent tag OoS.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">Data structuring</head><p>We experimented two different ways of structuring the input features for processing:</p><p>• Win: this mode of organising input data is based on a sliding window that starts from the beginning of each sentence and concatenates word feature vectors into one single vector.</p><p>Padding is inserted at sentence borders.</p><p>• Seq: each sentence is managed as one single sequence padded at the borders.</p><p>Each network experimented in this study uses one of these two data structuring type.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">(Deep) Learning Blocks</head><p>All the experiments presented in this paper has been performed using Keras<ref type="foot" target="#foot_1">2</ref> a "a minimalist, highly modular neural networks library, written in Python and capable of running on top of either TensorFlow or Theano", two widely used tensor manipulation libraries. Keras provides some basic neural network blocks as well as different learning procedures for the desired network configuration and simple tools for writing new blocks. In our experiments we used some of them, namely multilayer-perceptrons (MLP) and Long Short-Term Memory (LSTM), and we wrote a new block to handle Conditional Random Fields (CRF).</p><p>MLP are simple feedforward neural networks with one or more fully-connected hidden layers. We obtained maximum performances using only one hidden layer.</p><p>LSTM networks <ref type="bibr" target="#b5">(Hochreiter and Schmidhuber, 1997;</ref><ref type="bibr" target="#b4">Graves and Schmidhuber, 2005)</ref> are a kind of recurrent neural network which received a lot of attention in recent years due to their ability of produce good classification results for sequence problems. Their property of preventing the vanishing (and exploding) gradient problem that affects standard recurrent neural networks made them the default choice for solving sequence classification problems inside the DNN framework. Usually this kind of units are arranged to form a bidirectional chain (BiLSTM) for gathering information both from the past and from the future of the input data sequence, a very desirable issue for such kind of classification problems. In all our experiments using BiLSTM we obtained maximum performances by stacking two layers of them, with a dropout layer after each of them <ref type="bibr" target="#b13">(Srivastava et al., 2014)</ref>, and a final dense softmax layer, or a time-distributed-dense softmax layer, feeded by the BiLSTM output.</p><p>Linear CRFs are the simpler Probabilistic Graphical Model (PGM) and it has been successfully used in NLP for sequence classification problems <ref type="bibr" target="#b8">(Lafferty et al., 2001)</ref>. We did some experiments stacking them after the softmax layer.</p><p>Figure <ref type="figure" target="#fig_0">1</ref> shows the most complex DNN structure used in out experiments. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experiments</head><p>All the experiments presented in this paper to test the effectiveness of the proposed system refer to two evaluation campaigns organised inside the EVALITA 3 framework. In particular, in 2007 and 2009 were organised specific task to test Italian PoS-taggers performances.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">The EVALITA 2007 evaluation</head><p>Two separate data sets were provided: the Development Set (DS), composed of 133,756 tokens, was used for system development and for the training phase, while a Test Set (TS), composed of 17,313 tokens, was used as a reference for systems evaluation. Both contain various documents belonging mainly to journalistic and narrative genres, with small sections containing academic and legal/administrative prose. Each participant was allowed to use any available resource or could freely induce it from the training data. The original PoS-tagging task involved two different tagsets, but our experiments used only the tags and the annotation named 'EAGLES-like'.</p><p>The evaluation metrics were based on a tokenby-token comparison and only one tag was allowed for each token. The EVALITA metric considered in this study is the Tagging Accuracy, defined as the number of correct PoS-tag assignments divided by the total number of tokens in the TS. See <ref type="bibr" target="#b15">(Tamburini, 2007)</ref> for further details.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">The EVALITA 2009 evaluation</head><p>The DS consisted in 113895 word forms (already divided in a training set -108,874 tokens -and a validation set -5021 tokens). The TS consisted of 5066 word forms. The training set is formed by newspaper articles from 'La Repubblica', while the validation and test set contain documents extracted from the Italian Wikipedia. This test the degree of system adaptation to new domains.</p><p>The organisers evaluated the results using a coarse grained (37 tags) and a morphed (336 tags) tagsets inserted in a closed/open task framework, but in this study all the results refer to the open task (one can use external resources) on the coarse grained tagset. The evaluation metric is the same described before in section 4.1. See <ref type="bibr" target="#b0">(Attardi and Simi, 2009)</ref> for further details.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Hyper-Parameters</head><p>Considering the large number of hyper-parameters involved in the whole procedure, we did not test all the possible combinations; we used, instead, the most common set-up of parameters gathered from the literature.  <ref type="bibr" target="#b7">(Kingma and Ba, 2015)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">The Early Stopping Drama</head><p>There are some interesting studies <ref type="bibr" target="#b1">(Bengio, 2012;</ref><ref type="bibr" target="#b10">Prechelt, 2012)</ref> dealing with the problem of stopping the learning process at the right point; this issue is known as the 'early stopping' problem.</p><p>Choosing the correct epoch to stop the learning process helps avoiding overfitting on the training set and usually produces systems exhibiting better generalisations. But, how to choose the correct epoch is not simple. The suggestion given in various studies on this topic is to consider a validation set and stop the learning process when the performances on this set do not increase anymore or even decrease, a clear hint of overfitting. The usual way to set up an experiment following this suggestions involves splitting the gold standard into three different instance sets: the training set, for training, the validation set, to determine the stopping point, and the test set to evaluate the system. However, we are testing our systems on real evaluation data that has been already split by the organisers into development and test set. Thus, we can divide the development set into training/validation set for optimising the hyperparameters and define the stopping epoch, but, for the final evaluation, we would like to train the final system on the complete development set to adhere to the evaluation constraints and to benefit from using more training data.</p><p>Having two different training procedures for the optimisation and evaluation phases leads to a more complex procedure for determining the stopping epoch. Moreover, the typical accuracy profile for DNN systems is not smooth and oscillate heavily during training. To avoid any problem in determining the stopping point we smoothed all the profiles using a bezier spline. The procedure we adopted to determine the stopping epoch is (please look at Fig. <ref type="figure">2):</ref> (1) find the first maximum in the validation smoothed profile -A; (2) find the corresponding value of accuracy on the smoothed training profile -B; (3) find the point in the smoothed development set profile having the same accuracy as in B -C; (4) select the epoch corresponding at point C as the stopping epoch -D.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.5">Results</head><p>Table 2 outlines the systems' accuracies for different configurations for both datasets. We can observe that by using AnIta morphological information, as well as all the techniques described Figure <ref type="figure">2</ref>: The early stopping procedure. in section 2.3, improves the systems' results by more than 1%. Considering the data structuring described in section 2.4, the management of an entire sentence as a complete sequence allows recurrent configurations to work with larger contexts producing better results. Adding a CRF layer after the BiLSTM seems to slightly improve the performances, but not in a significant way. In Table <ref type="table" target="#tab_2">3</ref> we can see our best system performances, namely AnIta-BiLSTM-CRF, compared with the three best systems of the considered EVALITA campaigns. As you can see, in both cases the proposed system ranked first improving the scoring by large quantities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>SYSTEM</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusions</head><p>The proposed system for PoS-tagging, integrating DNNs and a powerful morphological analyser, exhibited very good accuracy results when applied to standard Italian evaluation datasets from the EVALITA campaigns. The information from AnIta proved to be crucial to reach such accuracy values as well as stacked BiLSTM networks processing entire sentence sequences. We have to further test different DNN configurations and their integration with other kind of PGMs as well as make more experiments with different hyperparameters.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: The most complex DNN used in our experiments.</figDesc><graphic coords="3,77.67,221.94,206.92,185.72" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>3 http://www.evalita.it/</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Table1outlines the whole set-up for the unmodified hyper-parameters. Unmodified hyper-parameters and algorithms used in our experiments. NU means the number of hidden or LSTM units per layer (the same for all layers). For Adam refer to</figDesc><table><row><cell cols="2">word2vec Embed.</cell><cell cols="2">Feature extraction</cell></row><row><cell>Hyperpar.</cell><cell>Value</cell><cell>Hyperpar.</cell><cell>Value</cell></row><row><cell>type</cell><cell cols="2">SkipGr. window</cell><cell>5</cell></row><row><cell>size</cell><cell>100</cell><cell cols="2">Learning Params.</cell></row><row><cell>(1/2) win.</cell><cell>5</cell><cell cols="2">batch (win) 1/4*NU</cell></row><row><cell>neg. sampl.</cell><cell>25</cell><cell>batch (seq)</cell><cell>1</cell></row><row><cell>sample</cell><cell>1e-4</cell><cell>Opt. Alg.</cell><cell>Adam</cell></row><row><cell>iter</cell><cell>15</cell><cell cols="2">Loss Func. Categ.CE</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Tagging accuracies (TA) for different configurations for both datasets. ('M' marks the use of AnIta morphological information).</figDesc><table><row><cell></cell><cell>TA</cell><cell>Notes</cell></row><row><cell></cell><cell>E07</cell><cell>E09</cell></row><row><cell>MLP-256</cell><cell cols="2">96.45 95.57 Win=5</cell></row><row><cell>MLP-256</cell><cell cols="2">97.75 96.84 M,Win=5</cell></row><row><cell>2-BiLSTM-256</cell><cell cols="2">98.12 97.30 M,Win=5</cell></row><row><cell>2-BiLSTM-256</cell><cell cols="2">98.14 97.45 M,Seq</cell></row><row><cell cols="3">2-BiLSTM-256-CRF 98.18 97.48 M,Seq</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 :</head><label>3</label><figDesc>Participants' results with respect to Tagging Accuracy (TA) at EVALITA 2007 and 2009.</figDesc><table><row><cell>EVALITA 2007</cell><cell></cell></row><row><cell>SYSTEM</cell><cell>TA</cell></row><row><cell>AnIta-BiLSTM-CRF</cell><cell>98.18</cell></row><row><cell>FBKirst Zanoli</cell><cell>98.04</cell></row><row><cell>UniTn Baroni</cell><cell>97.89</cell></row><row><cell>ILCcnrUniPi Lenci</cell><cell>97.65</cell></row><row><cell>EVALITA 2009</cell><cell></cell></row><row><cell>AnIta-BiLSTM-CRF</cell><cell>97.48</cell></row><row><cell>UniPi SemaWiki 2</cell><cell>97.03</cell></row><row><cell>UniPi SemaWiki 1</cell><cell>96.73</cell></row><row><cell>UniPi SemaWiki 4</cell><cell>96.67</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://code.google.com/archive/p/word2vec/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://github.com/fchollet/keras/tree/master/keras</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Overview of the EVALITA 2009 Part-of-Speech Tagging Task</title>
		<author>
			<persName><forename type="first">Giuseppe</forename><surname>Attardi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Maria</forename><surname>Simi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of Workshop Evalita</title>
				<meeting>of Workshop Evalita</meeting>
		<imprint>
			<date type="published" when="2009">2009. 2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Practical Recommendations for Gradient-Based Training of Deep Architectures</title>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Neural Networks: Tricks of the Trade: Second Edition</title>
				<editor>
			<persName><forename type="first">Grégoire</forename><surname>Montavon</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Geneviève</forename><forename type="middle">B</forename><surname>Orr</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Klaus-Robert</forename><surname>Müller</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg; Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="437" to="478" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Sequential Labeling with Bidirectional LSTM-CNNs</title>
		<author>
			<persName><forename type="first">Jason</forename><surname>Chiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Eric</forename><surname>Nichols</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. International Conf. of Japanese Association for NLP</title>
				<meeting>International Conf. of Japanese Association for NLP</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="937" to="940" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Natural language processing (almost) from scratch</title>
		<author>
			<persName><forename type="first">Ronan</forename><surname>Collobert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jason</forename><surname>Weston</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Léon</forename><surname>Bottou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><surname>Karlen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Koray</forename><surname>Kavukcuoglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pavel</forename><surname>Kuksa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Mach. Learn. Res</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="2493" to="2537" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Framewise phoneme classification with bidirectional lstm and other neural network architectures</title>
		<author>
			<persName><forename type="first">Santos</forename><surname>Cicero</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bianca</forename><surname>Zadrozny</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename><forename type="middle">W&amp;cp</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Alex</forename><surname>Graves</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jürgen</forename><surname>Schmidhuber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 31st International Conference on Machine Learning</title>
				<meeting>of the 31st International Conference on Machine Learning</meeting>
		<imprint>
			<publisher>JMLR</publisher>
			<date type="published" when="2005">2014. 2005</date>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="page" from="602" to="610" />
		</imprint>
	</monogr>
	<note>Learning character-level representations for partof-speech tagging</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Long short-term memory</title>
		<author>
			<persName><forename type="first">Sepp</forename><surname>Hochreiter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jürgen</forename><surname>Schmidhuber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neural Computation</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="1735" to="1780" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Bidirectional LSTM-CRF Models for Sequence Tagging</title>
		<author>
			<persName><forename type="first">Zhiheng</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Wei</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kai</forename><surname>Yu</surname></persName>
		</author>
		<idno>1508.01991</idno>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note type="report_type">ArXiv e-prints</note>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Adam: a method for stochastic optimization</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Ba</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. International Conference on Learning Representations -ICLR</title>
				<meeting>International Conference on Learning Representations -ICLR</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1" to="13" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Conditional random fields: Probabilistic models for segmenting and labeling sequence data</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lafferty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mccallum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Pereira</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 18th International Conf. on Machine Learning</title>
				<meeting>18th International Conf. on Machine Learning</meeting>
		<imprint>
			<date type="published" when="2001">2001</date>
			<biblScope unit="page" from="282" to="289" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Efficient Estimation of Word Representations in Vector Space</title>
		<author>
			<persName><forename type="first">Tomas</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kai</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Greg</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jeffrey</forename><surname>Dean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of Workshop at ICLR</title>
				<meeting>of Workshop at ICLR</meeting>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Early Stopping -But When?</title>
		<author>
			<persName><forename type="first">Lutz</forename><surname>Prechelt</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Neural Networks: Tricks of the Trade: Second Edition</title>
				<editor>
			<persName><forename type="first">Grégoire</forename><surname>Montavon</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Geneviève</forename><forename type="middle">B</forename><surname>Orr</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Klaus-Robert</forename><surname>Müller</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin Heidelberg; Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="53" to="67" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">CORIS/CODIS: A corpus of written Italian based on a defined and a dynamic model</title>
		<author>
			<persName><forename type="first">Rema</forename><surname>Rossini Favretti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fabio</forename><surname>Tamburini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Cristiana</forename><forename type="middle">De</forename><surname>Santis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">A Rainbow of Corpora: Corpus Linguistics and the Languages of the World</title>
				<editor>
			<persName><forename type="first">Andrew</forename><surname>Wilson</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Paul</forename><surname>Rayson</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Tony</forename><surname>Mcenery</surname></persName>
		</editor>
		<meeting><address><addrLine>Munich</addrLine></address></meeting>
		<imprint>
			<publisher>Lincom-Europa</publisher>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="27" to="38" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Semi-supervised condensed nearest neighbor for part-of-speech tagging</title>
		<author>
			<persName><forename type="first">Anders</forename><surname>Søgaard</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies</title>
				<meeting>of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies<address><addrLine>Portland, Oregon, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="48" to="52" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Dropout: A simple way to prevent neural networks from overfitting</title>
		<author>
			<persName><forename type="first">Nitish</forename><surname>Srivastava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Geoffrey</forename><surname>Hinton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alex</forename><surname>Krizhevsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ilya</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ruslan</forename><surname>Salakhutdinov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">15</biblScope>
			<biblScope unit="page" from="1929" to="1958" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">AnIta: a powerful morphological analyser for Italian</title>
		<author>
			<persName><forename type="first">Fabio</forename><surname>Tamburini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Matias</forename><surname>Melandri</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 8th International Conference on Language Resources and Evaluation -LREC 2012</title>
				<meeting>8th International Conference on Language Resources and Evaluation -LREC 2012<address><addrLine>Istanbul</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="941" to="947" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">EVALITA 2007: the Partof-Speech Tagging Task</title>
		<author>
			<persName><forename type="first">Fabio</forename><surname>Tamburini</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Intelligenza Artificiale</title>
		<imprint>
			<biblScope unit="volume">IV</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="4" to="7" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">CORISTagger: a highperformance PoS tagger for Italian</title>
		<author>
			<persName><forename type="first">Fabio</forename><surname>Tamburini</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Intelligenza Artificiale</title>
				<imprint>
			<date type="published" when="2007">2007b</date>
			<biblScope unit="volume">IV</biblScope>
			<biblScope unit="page" from="14" to="15" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding</title>
		<author>
			<persName><forename type="first">Peilu</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Frank</forename><surname>Yao Qian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lei</forename><surname>Soong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hai</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><surname>Zhao</surname></persName>
		</author>
		<idno>ArXiv e-prints, 1511.00215</idno>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
