<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">ITAmoji 2018: Emoji Prediction via Tree Echo State Networks</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Daniele</forename><forename type="middle">Di</forename><surname>Sarli</surname></persName>
							<email>d.disarli@studenti.unipi.it</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">University of Pisa</orgName>
								<address>
									<settlement>Pisa</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Claudio</forename><surname>Gallicchio</surname></persName>
							<email>gallicch@di.unipi.it</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">University of Pisa</orgName>
								<address>
									<settlement>Pisa</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alessio</forename><surname>Micheli</surname></persName>
							<email>micheli@di.unipi.it</email>
							<affiliation key="aff0">
								<orgName type="department">Department of Computer Science</orgName>
								<orgName type="institution">University of Pisa</orgName>
								<address>
									<settlement>Pisa</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">ITAmoji 2018: Emoji Prediction via Tree Echo State Networks</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">48D8EFEE6FA90EE79F5218C120CEB992</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T21:47+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>English.</head><p>For the "ITAmoji" EVALITA 2018 competition we mainly exploit a Reservoir Computing approach to learning, with an ensemble of models for trees and sequences. The sentences for the models of the former kind are processed by a language parser and the words are encoded by using pretrained FastText word embeddings for the Italian language. With our method, we ranked 3 rd out of 5 teams.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Italiano.</head><p>Per la competizione EVALITA 2018 sfruttiamo principalmente un approccio Reservoir Computing, con un ensemble di modelli per sequenze e per alberi. Le frasi per questi ultimi sono elaborate da un parser di linguaggi e le parole codificate attraverso degli embedding FastText preaddestrati per la lingua italiana. Con il nostro metodo ci siamo classificati terzi su un totale di 5 team.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Echo State Networks <ref type="bibr" target="#b11">(Jaeger and Haas, 2004</ref>) are an efficient class of recurrent models under the framework of Reservoir Computing <ref type="bibr" target="#b14">(Lukoševičius and Jaeger, 2009)</ref>, where the recurrent part of the model ("reservoir") is carefully initialized and then left untrained <ref type="bibr" target="#b6">(Gallicchio and Micheli, 2011)</ref>. The only weights that are trained are part of a usually simple readout layer 1 . Echo State Networks were originally designed to work on sequences, however it has been shown how to extend them to deal with recursively structured data, and 1 Trained in closed form, e.g. by Moore-Penrose pseudoinversion, or Ridge Regression. 20.27% 19.86% 9.45% 5.35% 5.13% 4.11%</p><p>3.54% 3.33% 2.80% 2.57%</p><p>2.18% 2.16% 2.03% 1.94% 1.78%</p><p>1.67% 1.55% 1.52% 1.49% 1.39%</p><p>1.37% 1.28% 1.12% 1.07% 1.06%</p><p>Figure <ref type="figure">1</ref>: Emojis under consideration and their frequency within the dataset.</p><p>trees in particular, with Tree Echo State Networks <ref type="bibr" target="#b7">(Gallicchio and Micheli, 2013)</ref>, also referred to as TreeESNs.</p><p>We follow this approach for solving the ITAmoji task in the EVALITA 2018 competition <ref type="bibr">(Ronzano et al., 2018)</ref>. In particular, we parse the input texts into trees resembling the grammatical structure of the sentences, and then we use multiple TreeESN models to process the parse trees and make predictions. We then merge these models by using an ensemble to make our final predictions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Task and Dataset</head><p>Given a set of Italian tweets, the goal of the ITAmoji task is to predict the most likely emoji associated with each tweet. The dataset contains 250,000 tweets in Italian, each of them originally containing only one (possibly repeated) of the 25 emojis considered in the task (see Figure <ref type="figure">1</ref>). The emojis are removed from the sentences and used as targets.</p><p>The test dataset contains 25,000 tweets similarly processed.</p><p>The provided dataset has been shuffled and split into a training set (80%) and a validation set (20%).</p><p>We preprocessed the data by first removing any URL from the sentences, as most of them did not contain any informative content (e.g. "https://t.co/M3StiVOzKC"). We then parsed the sentences by using two different parsers for the Italian language: Tint<ref type="foot" target="#foot_0">2</ref> (Palmero Aprosio and Moretti, 2016) and spaCy <ref type="bibr" target="#b10">(Honnibal and Johnson, 2015)</ref>. This produced two sets of trees, both including information about the dependency relations between the nodes of each tree. We finally replace each word with its corresponding pretrained FastText embedding <ref type="bibr" target="#b12">(Joulin et al., 2016)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Description of the system</head><p>Our ensemble is composed by 13 different models, 12 of which are TreeESNs and the other one is a Long Short-Term Memory (LSTM) over characters. Different random initializations ("trials") of the model parameters are all included in the ensemble in order to enrich the diversity of the hypotheses. We summarize the entire configuration in Table <ref type="table" target="#tab_0">1</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">TreeESN models</head><p>The TreeESN that we are using is a specialization of the description given by <ref type="bibr" target="#b7">Gallicchio and Micheli (2013)</ref>, and the reader can refer to that work for additional details. Here, the state corresponding to node n of an input tree t is computed as:</p><formula xml:id="formula_0">x(n) = f W in u(n) + 1 k k i=1 Ŵn i x(ch i (n)) ,</formula><p>(1) where u(n) is the label of node n in the input tree, k is the number of children of node n, ch i (n) is the i-th child of node n, W in is the input-toreservoir weight matrix, Ŵn i is the recurrent reservoir weight matrix associated to the grammatical relation between node n and its i-th child, and f is the element-wise applied activation function of the reservoir units (in our case, it is a tanh). All matrices in Equation 1 are left untrained.</p><p>Note that Equation 1 determines a recursive application (bottom-up visit) over each node of the tree t until the state for all nodes is computed, which we can express in structured form as x(t). The resulting tree x(t) is then mapped into a fixedsize feature representation via the χ state mapping function. We make use of mean and sum state mapping functions, respectively yielding the mean and the sum of all the states. The result, χ(x(t)), is then projected into a different space by a matrix W φ :</p><formula xml:id="formula_1">ŷ = f φ (W φ χ(x(t))) ,<label>(2)</label></formula><p>where f φ is an activation function.</p><p>For the readout we use both a linear regression approach with L2 regularization known as Ridge regression <ref type="bibr" target="#b9">(Hoerl and Kennard, 1970</ref>) and a multilayer perceptron (MLP):</p><formula xml:id="formula_2">y = readout(ŷ),<label>(3)</label></formula><p>where y ∈ R 25 is the output vector, which represents a score for each of the classes: the index with the highest value corresponds to the most likely class.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">CharLSTM model</head><p>The CharLSTM model uses a bidirectional LSTM (Hochreiter and Schmidhuber, 1997; <ref type="bibr" target="#b8">Graves and Schmidhuber, 2005)</ref> with 2 layers, which takes as input the characters of the sentences expressed as pretrained character embeddings of size 300. The LSTM output is then fed into a linear layer with 25 output units.</p><p>Similar models have been used in recent works related to emoji prediction, see for example the model used by <ref type="bibr" target="#b0">Barbieri et al. (2017)</ref>, or the one by <ref type="bibr" target="#b3">Baziotis et al. (2018)</ref>, which is however a more complex word-based model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Ensemble</head><p>We take into consideration two different ensembles, both containing the models in Table <ref type="table" target="#tab_0">1</ref>, but with different strategies for weighting the N P predictions. In the following, let Y ∈ R N P ×25 be the matrix containing one prediction per row.</p><p>The weights for the first ensemble (corresponding to the run file run1.txt) have been produced by a random search: at each iteration we compute a random vector w ∈ R N P with entries sampled from a random variable W 2 , W ∼ U[0, 1]. near-zero weights. After selecting the best configuration on the validation set, the predictions from each of the models are merged together in weighted mean:</p><formula xml:id="formula_3">ȳ = wY<label>(4)</label></formula><p>For the second type of ensemble (corresponding to the run file run2.txt) we adopt a multilayer perceptron. We feed as input the N P predictions concatenated into a single vector y (1...N P ) ∈ R 25N P , so that the model is:</p><formula xml:id="formula_4">ȳ = tanh y (1...N P ) W 1 + b 1 W 2 + b 2 , (5)</formula><p>where the hidden layer has size 259 and the output layer is composed by 25 units.</p><p>In both types of ensemble, as before, the output vector contains a score for each of the classes, providing a way to rank them from the most to the least likely. The most likely class c is thus computed as c = arg max i ȳi .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Training</head><p>The training algorithm differs based on the kind of model taken under consideration. We address each of them in the following paragraphs.</p><p>Models 1-6 The first six models are TreeESNs using a multilayer perceptron as readout. Given the fact that the main evaluation metric for the competition is the Macro F-score, each of the models has been trained by rebalancing the frequencies of the different target classes. In particular, the sampling probability for each input tree has been skewed so that the data extracted during training follows a uniform distribution with respect to the target class. For the readout part we use the Adam algorithm <ref type="bibr" target="#b13">(Kingma and Ba, 2015)</ref> for the stochastic optimization of the multi-class cross entropy loss function.</p><p>Models 7-10 Models from 7 to 10 are again TreeESNs, but with a Ridge Regression readout. In this case, 25 classifiers are trained with a 1-vs-all method, one for each class, using binary targets.</p><p>Models 11-12 Models 11 and 12 are again TreeESNs with a Ridge Regression readout, but they are trained to distinguish only between the most frequent class, the second most frequent class and all the other classes aggregated together. This is done to try to improve the ensemble precision and recall for the top two classes.</p><p>Model 13 The last model is a sequential LSTM over character embeddings. Like in the first 6 models, the Adam algorithm is used to optimize the cross entropy loss function.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Results</head><p>The ensemble seems to bring a substantial improvement to the performance on the validation set, as highlighted in Table <ref type="table" target="#tab_1">2</ref>. This is possible thanks to the number and diversity of the different models, as we can see in Figure <ref type="figure">2</ref> where we show the Pearson correlation coefficients between the predictions of the models in the ensemble.</p><p>On the test set we scored substantially lower,   <ref type="bibr">2018)</ref>. In Figure <ref type="figure">3</ref> we report the confusion matrix (with values normalized over the columns to address label imbalance) and the accuracy over the top-N classes.</p><p>An interesting characteristic of this approach, though, is computation time: we were able to train a TreeESN with 5000 reservoir units over 200,000 trees in just about 25 minutes, and this is without exploiting parallelism between the trees.</p><p>In ITAmoji 2018, our team ranked 3 rd out of 5. Detailed results and rankings are available at http://bit.ly/ITAmoji18.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Discussion and conclusions</head><p>Different authors have highlighted the difference in performance between SVM models and (deep) neural models for emoji prediction, and more in general for text classification tasks, suggesting that simple models like SVMs are more able to capture the features which are most important for generalization: see for example the reports of the SemEval-2018 participants C ¸öltekin and Rama (2018) and <ref type="bibr" target="#b5">Coster et al. (2018)</ref>.</p><p>In this work, instead, we approached the problem from the novel perspective of reservoir computing applied to the grammatical tree structure of the sentences. Despite a significant performance drop on the test set<ref type="foot" target="#foot_1">3</ref> we showed that, paired with a rich ensemble, the method is comparable to the results obtained in the past by other participants in similar competitions using very different models.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 2 :Figure 3 :</head><label>23</label><figDesc>Figure 2: Plot of the correlation between the predictions of the models in the ensemble. For reasons of space, not all labels are shown on the axes.</figDesc><graphic coords="4,88.52,82.59,161.53,161.53" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>The square increases the probability of sampling Composition of the ensemble, highlighting the differences between the models.</figDesc><table><row><cell># Class</cell><cell>Reservoir units</cell><cell>f φ</cell><cell>Readout</cell><cell cols="2">Parser Trials</cell></row><row><cell>1 TreeESN</cell><cell>1000</cell><cell>ReLU</cell><cell>MLP</cell><cell>Tint</cell><cell>10</cell></row><row><cell>2 TreeESN</cell><cell>1000</cell><cell>Tanh</cell><cell>MLP</cell><cell>Tint</cell><cell>10</cell></row><row><cell>3 TreeESN</cell><cell>5000</cell><cell>Tanh</cell><cell>MLP</cell><cell>Tint</cell><cell>1</cell></row><row><cell>4 TreeESN</cell><cell>5000</cell><cell>Tanh</cell><cell>MLP</cell><cell>spaCy</cell><cell>2</cell></row><row><cell>5 TreeESN</cell><cell>5000</cell><cell>ReLU</cell><cell>MLP</cell><cell>Tint</cell><cell>1</cell></row><row><cell>6 TreeESN</cell><cell>5000</cell><cell>ReLU</cell><cell>MLP</cell><cell>spaCy</cell><cell>1</cell></row><row><cell>7 TreeESN</cell><cell>5000</cell><cell cols="2">Tanh Ridge regression</cell><cell>Tint</cell><cell>1</cell></row><row><cell>8 TreeESN</cell><cell>5000</cell><cell cols="3">Tanh Ridge regression spaCy</cell><cell>3</cell></row><row><cell>9 TreeESN</cell><cell>5000</cell><cell cols="2">ReLU Ridge regression</cell><cell>Tint</cell><cell>1</cell></row><row><cell>10 TreeESN</cell><cell>5000</cell><cell cols="3">ReLU Ridge regression spaCy</cell><cell>3</cell></row><row><cell>11 TreeESN</cell><cell>5000</cell><cell cols="2">Tanh Ridge regression</cell><cell>Tint</cell><cell>1</cell></row><row><cell>12 TreeESN</cell><cell>5000</cell><cell cols="3">Tanh Ridge regression spaCy</cell><cell>2</cell></row><row><cell>13 CharLSTM</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>-</cell><cell>1</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Performance obtained on the validation set for the two submitted runs. The columns are, in order, the average and maximum Macro-F1 over the models in the ensemble, and the Macro-F1 and Coverage Error of the ensemble.</figDesc><table><row><cell cols="5">Run Avg F1 Max F1 Ens. F1 CovE</cell></row><row><cell>run1</cell><cell>14.4</cell><cell>18.5</cell><cell>24.9</cell><cell>4.014</cell></row><row><cell>run2</cell><cell>14.4</cell><cell>18.5</cell><cell>26.7</cell><cell>3.428</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 :</head><label>3</label><figDesc>Performance on the test set. These values have been obtained by retraining the models over the whole dataset (training set and validation set) after the final model selection phase.with the Macro-F1 and Coverage Errors reported in Table3. These numbers are close to those obtained by the top two models applied to the Spanish language in the "Multilingual Emoji Prediction" task of the SemEval-2018 competition<ref type="bibr" target="#b17">(Barbieri et al., 2018)</ref>, with F1 scores of22.36 and  18.73 (C ¸öltekin and Rama, 2018;<ref type="bibr" target="#b5">Coster et al.,</ref> </figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">Emitting data in the CoNLL-U format<ref type="bibr" target="#b15">(Nivre et al., 2016)</ref>, a revised version of the CoNLL-X format(Buchholz  and Marsi, 2006).</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">Probably due to overtraining: we observed that Macro-F1 overcame 0.40 in training.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">Francesco</forename><surname>Barbieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Miguel</forename><surname>Ballesteros</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Horacio</forename><surname>Saggion</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1702.07285</idno>
		<title level="m">Are Emojis Predictable?</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">Francesco</forename><surname>Barbieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jose</forename><surname>Camacho-Collados</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">SemEval 2018 Task 2: Multilingual Emoji Prediction</title>
		<author>
			<persName><forename type="first">Francesco</forename><surname>Ronzano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Luis</forename><surname>Espinosa Anke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Miguel</forename><surname>Ballesteros</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Valerio</forename><surname>Basile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Viviana</forename><surname>Patti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Horacio</forename><surname>Saggion</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of The 12th International Workshop on Semantic Evaluation</title>
				<meeting>The 12th International Workshop on Semantic Evaluation</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="24" to="33" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Predicting Emojis using RNNs with Context-aware Attention</title>
		<author>
			<persName><forename type="first">Christos</forename><surname>Baziotis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nikos</forename><surname>Athanasiou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Georgios</forename><surname>Paraskevopoulos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nikolaos</forename><surname>Ellinas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Athanasia</forename><surname>Kolovou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alexandros</forename><surname>Potamianos</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1804.06657</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Tenth Conference on Computational Natural Language Learning</title>
				<editor>
			<persName><forename type="first">Sabine</forename><surname>Buchholz</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Erwin</forename><surname>Marsi</surname></persName>
		</editor>
		<meeting>the Tenth Conference on Computational Natural Language Learning</meeting>
		<imprint>
			<date type="published" when="2006">2018. 2006</date>
			<biblScope unit="page" from="149" to="164" />
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Tübingen-Oslo at SemEval-2018 Task 2: SVMs perform better than RNNs in Emoji Prediction</title>
		<author>
			<persName><forename type="first">C</forename><surname>¸agrı</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>¸öltekin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Taraka</forename><surname>Rama</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of The 12th International Workshop on Semantic Evaluation</title>
				<meeting>The 12th International Workshop on Semantic Evaluation</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="34" to="38" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Hatching Chick at SemEval-2018 Task 2: Multilingual Emoji Prediction</title>
		<author>
			<persName><forename type="first">Joël</forename><surname>Coster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Reinder</forename><surname>Gerard Dalen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nathalie</forename><surname>Adriënne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jacqueline</forename><surname>Stierman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of The 12th International Workshop on Semantic Evaluation</title>
				<meeting>The 12th International Workshop on Semantic Evaluation</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="445" to="448" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Architectural and Markovian factors of echo state networks</title>
		<author>
			<persName><forename type="first">Claudio</forename><surname>Gallicchio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alessio</forename><surname>Micheli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neural Networks</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="issue">5</biblScope>
			<biblScope unit="page" from="440" to="456" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Tree Echo State Networks</title>
		<author>
			<persName><forename type="first">Claudio</forename><surname>Gallicchio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alessio</forename><surname>Micheli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neurocomputing</title>
		<imprint>
			<biblScope unit="volume">101</biblScope>
			<biblScope unit="page" from="319" to="337" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Framewise phoneme classification with bidirectional LSTM networks</title>
		<author>
			<persName><forename type="first">Alex</forename><surname>Graves</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jürgen</forename><surname>Schmidhuber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename><surname>Hochreiter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jürgen</forename><surname>Schmidhuber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IJCNN&apos;05. Proceedings. 2005 IEEE International Joint conference on</title>
				<imprint>
			<date type="published" when="1997">2005. 2005. 1997</date>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="1735" to="1780" />
		</imprint>
	</monogr>
	<note>Long short-term memory</note>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Ridge regression: Biased estimation for nonorthogonal problems</title>
		<author>
			<persName><forename type="first">E</forename><surname>Arthur</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Robert</forename><forename type="middle">W</forename><surname>Hoerl</surname></persName>
		</author>
		<author>
			<persName><surname>Kennard</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Technometrics</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="55" to="67" />
			<date type="published" when="1970">1970</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">An Improved Non-monotonic Transition System for Dependency Parsing</title>
		<author>
			<persName><forename type="first">Matthew</forename><surname>Honnibal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mark</forename><surname>Johnson</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2015 Conference on Empirical Methods in Natural Language Processing<address><addrLine>Lisbon, Portugal</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2015-09">2015. September</date>
			<biblScope unit="page" from="1373" to="1378" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication</title>
		<author>
			<persName><forename type="first">Herbert</forename><surname>Jaeger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Harald</forename><surname>Haas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Science</title>
		<imprint>
			<biblScope unit="volume">304</biblScope>
			<biblScope unit="issue">5667</biblScope>
			<biblScope unit="page" from="78" to="80" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Bag of Tricks for Efficient Text Classification</title>
		<author>
			<persName><forename type="first">Armand</forename><surname>Joulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Edouard</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Piotr</forename><surname>Bojanowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tomas</forename><surname>Mikolov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1607.01759</idno>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Adam: Amethod for stochastic optimization</title>
		<author>
			<persName><forename type="first">P</forename><surname>Diederik</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jimmy</forename><forename type="middle">Lei</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><surname>Ba</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 3rd International Conference on Learning Representations (ICLR)</title>
				<meeting>the 3rd International Conference on Learning Representations (ICLR)</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Reservoir computing approaches to recurrent neural network training</title>
		<author>
			<persName><forename type="first">Mantas</forename><surname>Lukoševičius</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Herbert</forename><surname>Jaeger</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computer Science Review</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="127" to="149" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Universal Dependencies v1: A Multilingual Treebank Collection</title>
		<author>
			<persName><forename type="first">Joakim</forename><surname>Nivre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marie-Catherine</forename><surname>De Marneffe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Filip</forename><surname>Ginter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoav</forename><surname>Goldberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jan</forename><surname>Hajic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christopher</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ryan</forename><forename type="middle">T</forename><surname>Mcdonald</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Slav</forename><surname>Petrov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sampo</forename><surname>Pyysalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Natalia</forename><surname>Silveira</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">LREC</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Italy goes to Stanford: a collection of CoreNLP modules for Italian</title>
		<author>
			<persName><forename type="first">A</forename><surname>Palmero Aprosio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Moretti</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2016-09">2016. September</date>
		</imprint>
	</monogr>
	<note>ArXiv e-prints</note>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">Overview of the EVALITA</title>
		<author>
			<persName><forename type="first">Francesco</forename><surname>Ronzano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Francesco</forename><surname>Barbieri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Endang</forename><forename type="middle">Wahyu</forename><surname>Pamungkas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Viviana</forename><surname>Patti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Francesca</forename><surname>Chiusaroli</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018. 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Italian Emoji Prediction (ITAMoji) Task</title>
		<ptr target="CEUR.org" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA&apos;18)</title>
				<editor>
			<persName><forename type="first">Tommaso</forename><surname>Caselli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Nicole</forename><surname>Novielli</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Viviana</forename><surname>Patti</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Paolo</forename><surname>Rosso</surname></persName>
		</editor>
		<meeting>the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA&apos;18)<address><addrLine>Turin, Italy</addrLine></address></meeting>
		<imprint/>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
