<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Tree LSTMs for Learning Sentence Representations</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Héctor</forename><surname>Cerezo-Costas</surname></persName>
						</author>
						<author>
							<persName><roleName>Gradiant</roleName><surname>Alanttic</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Manuela</forename><surname>Martín-Vicente</surname></persName>
						</author>
						<author>
							<persName><forename type="first">F</forename><forename type="middle">J</forename><surname>González-Casta</surname></persName>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="institution">Universidade de Vigo</orgName>
								<address>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="institution">Edificio CITEXVI</orgName>
								<address>
									<addrLine>local 14</addrLine>
									<postCode>36310</postCode>
									<settlement>Vigo</settlement>
									<region>Pontevedra</region>
									<country>SPA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff2">
								<orgName type="institution">Edificio CITEXVI</orgName>
								<address>
									<addrLine>local 14</addrLine>
									<postCode>36310</postCode>
									<settlement>Vigo</settlement>
									<region>Pontevedra</region>
									<country>SPA</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff3">
								<orgName type="department">Dept. Enxeñaría Telemática E.E</orgName>
								<orgName type="institution">Telecomunicación Universidade de Vigo</orgName>
								<address>
									<settlement>SPA</settlement>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Tree LSTMs for Learning Sentence Representations</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">912BB46FD34A3A3E4946F847A5389BB5</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T09:58+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract xml:lang="it">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>English. In this work we obtain sentence embeddings with a recursive model using dependency graphs as network structure, trained with dictionary definitions. We compare the performance of our recursive Tree-LSTMs against other deep learning models: a recurrent version which considers a sequential connection between sentence elements, and a bag of words model which does not consider word ordering at all. We compare the approaches in an unsupervised similarity task in which general purpose embeddings should help to distinguish related content.</p><p>Italiano. In questo lavoro produciamo sentence embedding con un modello ricorsivo, utilizzando alberi di dipendenze come struttura di rete, addestrandoli su definizioni di dizionario. Confrontiamo le prestazioni dei nostri alberi-LSTM ricorsivi con altri modelli di apprendimento profondo: una rete ricorrente che considera una connessione sequenziale tra le parole della frase, e un modello bag-ofwords, che non ne considera l'ordine. La valutazione dei modelli viene effettutata su un task di similarit non supervisionata, in cui embedding di uso generale aiutano a distinguere i contenuti correlati.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Word embeddings have succeeded in obtaining word semantics and projecting this information in a vector space. <ref type="bibr" target="#b15">(Mikolov et al., 2013)</ref> proposed two methodologies for learning semantic abstractions of words from large volumes of unlabelled data, Skipgram and CBOW, comprised in the word2vec framework. Another approach is GloVe <ref type="bibr" target="#b17">(Pennington et al., 2014)</ref>, which learns from statistical co-occurrences of words. The two conceptually similar algorithms employ a sliding window of words, the context, with the intuition that words appearing frequently together are semantically related and thus should be represented closer in R n . The resulting vectors have shown strong correlation with human annotations in word-analogy tests <ref type="bibr" target="#b4">(Griffiths et al., 2007)</ref>.</p><p>Despite the success of word embeddings in capturing semantic information, they cannot obtain on its own the composition of longer constructions, which is essential for natural language understanding. Thus, several methods using deep neural networks combine word vectors for obtaining sentence representations with linear mappings <ref type="bibr" target="#b1">(Baroni and Zamparelli, 2010)</ref> and deep neural networks, which make use of multiple network layers to obtain higher levels of abstraction <ref type="bibr" target="#b20">(Socher et al., 2012)</ref>. One of the first approaches of obtaining generic embeddings was Paragraph2Vec (Le and <ref type="bibr" target="#b12">Mikolov, 2014)</ref>. Paragraph2Vec can learn unsupervised sentence representations, analogous to word2vec models for word representation, by adding an extra node, indicating the document contribution, to the model.</p><p>Attending to the way the nodes of the network link with each other, two approaches are frequent in NLP: recurrent neural networks and recursive neural networks (RNN)<ref type="foot" target="#foot_0">1</ref> . Recurrent models consider sequential links among words, while recursive models use graph-like structures for organizing the network operations. They process neighbouring words by parsing the tree order (dependency or syntactic graphs), and compute node representations for each parent recursively from the previous step until they reach the root of the tree, which gives the final sentence abstraction.</p><p>In this work, we train a variant of Tree-LSTM models for learning concept abstractions with dic-tionary descriptions as an input. To the best of our knowledge, this is the first attempt to embed dictionaries using such approach. Our model takes complex graph-like structures (e.g. syntactic or dependency graphs) as opposed to the most common approaches which employ recurrent models or unordered distributions of words as representation of the sentences. We use an unsupervised similarity benchmark with the intuition that better sentence embeddings will produce more coincidences with human annotations (comparably to the word analogy task in word embeddings).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>The following recurrent models are capable of obtaining general purpose embeddings of sentences: Skip-thought Vectors, and DictRep.</p><p>Skip-thought Vectors <ref type="bibr" target="#b9">(Kiros et al., 2015)</ref> learns general semantic sentence abstractions with unsupervised training. This concept is similar to the learning of word embeddings with the skipgram model <ref type="bibr" target="#b15">(Mikolov et al., 2013)</ref>. Skip-thoughts tries to code a sentence in such a way that it maximises the probability of recovering the preceding and following sentence in a document.</p><p>DictRep <ref type="bibr" target="#b5">(Hill et al., 2015)</ref> trains RNN networks and BoW models mapping definitions and words with different error functions (cosine similarity and ranking loss). Whilst the RNN models take into account the word orderings, the BoW models are just a weighted combination of the input embeddings. The simplest BoW approach offered competitive results against its RNN counterparts, beating them in most tests <ref type="bibr" target="#b6">(Hill et al., 2016)</ref>.</p><p>Recurrent models have achieved good performance results in different tasks such as polarity detection (e.g. bidirectional LSTMs in <ref type="bibr" target="#b22">(Tai et al., 2015)</ref>), machine translation <ref type="bibr" target="#b3">(Cho et al., 2014)</ref> or sentence similarity detection (e.g. Skip-thoughts), just to name a few.</p><p>Despite being less explored for building general purpose sentence embeddings, in several classification tasks, tree-structured RNNs represent the current state of the art. In their seminal paper, <ref type="bibr" target="#b21">(Socher et al., 2013)</ref> captured complex interactions among words with tensor operations and graph-like links among network nodes. Recursive Neural Tensor Networks (RNTN) networks have been used to solve a simplified version of a QA system in <ref type="bibr" target="#b7">(Iyyer et al., 2014)</ref>.</p><p>In <ref type="bibr" target="#b2">(Bowman, 2013)</ref>, the authors built a natural language inference system using RNTN in a simplified scenario with basic sentence constructions.</p><p>Although the results show that the system is able to learn inference relationships in most cases, it is unclear if this model could be generalised for more complex sentences. RNTNs were subsequently improved by <ref type="bibr" target="#b22">(Tai et al., 2015)</ref>, using LSTMs in the network nodes instead of tensors. With treestructures the network can capture language constructions which greatly affect the polarity of sentences (e.g. negation, polarity reversal, etc.). A more complete benchmark was conducted by <ref type="bibr" target="#b13">(Li et al., 2015)</ref>. There, sequential and recursive RNNs were tested in different tasks: sentiment analysis, question-answer matching, discourse parsing and semantic relation extraction.</p><p>Recursive models excelled in tasks with enough available supervised data, when nodes different from the root are labelled, or when semantic relationships must be extracted from distant words in a sentence.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Approach</head><p>Learning models that build a dictionary of embeddings have solid advantages over other supervised approaches, since they take advantage of large volumes of data that are already available online. The training data of the system are pairs of definition/target word which can be built with dictionaries or encyclopedia descriptions (e.g. picking the first sentences of a description as training data). We follow previous work of <ref type="bibr" target="#b5">(Hill et al., 2015)</ref> that employed dictionaries with sequential connections but using tree structures instead.</p><p>We used the Tree-LSTM as the starting point to build our system. The input to the system are the words conforming a definition together with the structure of the graph with the syntactic/dependency relationships, and the word closer to this definition, i.e. the target. Typically the LSTM nodes are intended for strictly sequential information propagation. Our variant is based in the previous work of ( <ref type="bibr" target="#b22">Tai et al., 2015)</ref>.</p><p>The main differences with the original LSTM node are the presence of two forget gates instead of one and the operation over two previous nodes of the system which modify node states and inhibitor gates. Hence, sub-indexes 1 and 2 are reserved for left and right child nodes of the graph, respectively. In this LSTM node there are no peephole connections between memory states and the inhibitor gates.</p><p>The state value in the root node is fed to the last layer of the system. Then, a non-linear transformation is applied to obtain the sentence embedding. In the basic configuration of the model, the error is measured by calculating the cosine similarity between target and predicted embeddings. The target is the embedding of the word result of the definition. Pre-trained word embeddings or random initialised embeddings might be employed. In the second case, the error is also propagated to the leaf nodes of the graph and thus the word embeddings are updated during training. We did not initialise randomly embeddings because this has consistently produced poorer results in comparison with the same model using pre-trained word embeddings.</p><p>In the network configurations of the tree-LSTM models, we added an extra backward link between the root node and the leaves reversing the uplink path (as hinted in <ref type="bibr" target="#b19">(Socher et al., 2011;</ref><ref type="bibr" target="#b16">Paulus et al., 2014)</ref>). In these settings, the error to minimise is a combination of the target word similarity and the leaves word similarity modulated by a smoothing parameter.</p><p>We implemented our model with Theano (Theano Development Team, 2016) and trained it with minibatch (30) and Adam <ref type="bibr" target="#b8">(Kingma and Ba, 2014)</ref> as optimisation algorithm (with parameters β 1 = 0.9, β 2 = 0.999 and learning rate l = 0.002). This configuration has achieved state of the art performance in other NLP tasks <ref type="bibr" target="#b10">(Kumar et al., 2015)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experiments</head><p>We compared DictRep (BoW and RNN) and our Tree-LSTM variant in a benchmark of unsupervised text similarity tasks and a supervised task (sentiment polarity). These tasks greatly benefit from a good representation of sentences and it requires a lot of human effort to build a dataset.</p><p>DictRep models were trained using available data and online code. For a fair comparison, all models employed the pre-trained word embeddings and training data provided by <ref type="bibr" target="#b5">(Hill et al., 2015)</ref> and cosine similarity as error metric. The configuration setting was similar for all the models.</p><p>Our model employs two connection configurations: The Tree-LSTM with transformed dependency graphs and the sequential mapping of connections, which is conceptually similar to the DictRep-RNN model.</p><p>For SkipThoughts we used the code available online (ski, ) and the pre-trained model with a sentence representation of 4800 dimensions. Additionally, we trained a compressed model with sentence and word representation dimensions of 1200 and 320 respectively in about three weeks. Like in the available model, the 80 million registers of the BookCorpus dataset <ref type="bibr" target="#b24">(Zhu et al., 2015)</ref> were used during the training process.</p><p>The objective of the semantic similarity task benchmark is to measure the similarity between a pair of sentences. SemEval STS 2014 <ref type="bibr" target="#b0">(Agirre et al., 2014)</ref> and SICK <ref type="bibr" target="#b14">(Marelli et al., 2014)</ref> datasets were used for benchmarks. In both datasets, each example was gold-standard ranked between 0 (totally unrelated sentences) and 5 (completely similar). Furthermore, SICK dataset considers three different types of semantic relatedness (Neutral, Entailment and Contradiction). We tested the models against the three relations to check if recursive and recurrent models exhibited different behaviour.</p><p>This is the same dataset used in previous work <ref type="bibr" target="#b6">(Hill et al., 2016)</ref> but excluding the WordNet set, since it was used as part of the training.</p><p>For the sentiment polarity, we used as training/validation data the Sentiment Penn Treebank dataset<ref type="foot" target="#foot_2">2</ref> . In this dataset, each sentence node is labelled with a 5-tag intensity tag from 0, the most negative, to 4. Sentences are already binarised in the same format of our TreeDict approach so that no preprocessing is needed in this task for TreeModels. We used for training and test the labels at the root node which is the the overall sentence polarity. For completeness, we repeat the analysis for a 3-label annotations over the same dataset. We used the same SVM classifier for all the models and we trained it with the sentence vectors as input.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Results and conclusion</head><p>The DictRep BoW model was undeniably better than the recurrent and recursive models achieving the best position in all cases (Table <ref type="table" target="#tab_0">1</ref>). The TreeDict-Dep model ranked second<ref type="foot" target="#foot_3">3</ref> . All models capture the correlations with human annotations better in neutral contexts. If there are contradictions and entailment relationships, the agreement with human annotations is less evident. Nevertheless, this behaviour is expected and also desirable, as this is an unsupervised benchmark and the system has no way of learning a similar but conflicting relationship without external help.</p><p>It is clear that BoW models offered the best performance in all the datasets. The Tree-LSTM model, which is consistently better than the sequential models, ranked second. Table <ref type="table" target="#tab_1">2</ref> shows the correlation among models over the SICK similarity dataset. All the models experience strong cross-correlations between them but the Tree-LSTM with dependency parsing showed the closest correlation with the BoW and recurrent models.</p><p>The Table <ref type="table" target="#tab_2">3</ref> shows the performance of the models in the supervised polarity tasks. BoW and SkipThoughts models experience similar outcomes for the 5 and 3 label task. Models trained with dictionary definitions (DictRep and TreeDict) lag behind those models. However, all the networks using dependency structures have consistently beaten its sequential counterparts. This is a strong indicative of the benefits of using this more complex network structure. The difference between the different network configurations of the same model are less pronounced that in the similarity tasks but in our tests, the models that used the extra link backwards achieved small gains (at least in the 3-label task).</p><p>In previous work, <ref type="bibr" target="#b6">(Hill et al., 2016)</ref> compared other models in this same similarity benchmark achieving comparable results. Not only DictRep-BoW models outperformed the DictRep-RNNs but also the Skip-thought model, which considers the order of the words in a sentence, was beaten by FastSent, its counterpart that employs BoW representation of a sentence. The effect of word orderings is not clear. BoW models are far from being ideal as they cannot obtain which parts are negated or the dependencies among the different elements of the sentence (e.g. the black dog chases the white cat and the black cat chases the white dog cannot be differentiated by only using BoW models).</p><p>It is important to mention that the similarity was tested only at the root node when using Tree-LSTM. Notwithstanding, recursive models allow to use more elaborated strategies, taking advantage of the dependencies used to build the relationships of the nodes in the deep network. These strategies could combine similarities at different levels of the sentence to obtain a more approximate value of similarity (e.g. using a pooling matrix with all the nodes of the parse tree <ref type="bibr" target="#b19">(Socher et al., 2011)</ref>).</p><p>The errors during training time in held-out data were 0.57 for BoW models versus the 0.51 achieved by recurrent and recursive models. Nevertheless, better dictionary embeddings do not seem to directly translate into better performance at inferring general purpose sentence embeddings in the benchmarks. Results in the test also show that we need better mechanisms to infer sentence level representations. In this paper we introduced the use of recursive models for the generation of general purpose embeddings once they are trained by embedding dictionary definitions. We compare recurrent and recursive models in the embedding dictionary task and we test the validity of these embeddings for their use as general purpose codification of sentences with both similarity.</p><p>Results demonstrate slight advantages of the Tree recursive variant over recurrent models that learn from dictionaries, which are more frequently employed. Recursive models are more expensive computationally and have a more complex implementation but they exhibit better performance in longer sentences. However, with current learning techniques recurrent and recursive models cannot offer better results than simpler models such as BoW representations of sentences in unsupervised similarity benchmarks. The results of these findings shall be confirmed in the future in more complex scenarios, such as large scale QA.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Tree-LSTM schema employed. Dotted blocks and lines depict the optional reverse channel.</figDesc><graphic coords="4,117.35,62.81,362.85,188.93" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Performance of the models measured with Spearman/Pearson correlations against golden standard annotations in the similarity benchmarks.</figDesc><table><row><cell></cell><cell></cell><cell></cell><cell></cell><cell>STS 2014</cell><cell></cell><cell></cell><cell>Sick</cell></row><row><cell>Model</cell><cell></cell><cell>News</cell><cell cols="4">Forum Twitter Images Headlines</cell><cell>Neu</cell><cell>Ent</cell><cell>Con</cell><cell>All</cell></row><row><cell cols="2">DictRep-BoW</cell><cell cols="4">.67/.74 .42/.39 .60/.65 .71/.74</cell><cell>.58/.62</cell><cell>.60/.70 .58/.56 .12/.18 .62/.72</cell></row><row><cell cols="2">DictRep-RNN</cell><cell cols="4">.45/.52 .06/.04 .30/.32 .57/.57</cell><cell>.39/.42</cell><cell>.52/.59 .22/.23 .09/.10 .48/.56</cell></row><row><cell cols="2">TreeDict-Seq</cell><cell cols="4">.48/.54 .24/.23 .40/.45 .60/.64</cell><cell>.46/.51</cell><cell>.51/.59 .24/.27 .07/.10 .51/.59</cell></row><row><cell cols="2">TreeDict-Seq 250</cell><cell cols="4">.50/.58 .20/.21 .44/.47 .61/.66</cell><cell>.46/.49</cell><cell>.56/.62 .27/.30 .08/.11 .54/.64</cell></row><row><cell cols="6">TreeDict-Seq 250BL .47/.47 .23/.21 .52/.59 .51/.51</cell><cell>.43/.45</cell><cell>.48/.52 .29/.33 .10/.14 .51/.56</cell></row><row><cell cols="2">TreeDict-Dep</cell><cell cols="2">.48/.55 .29/.28</cell><cell>-</cell><cell>.61/.67</cell><cell>-</cell><cell>.56/.64 .35/.39 .08/.13 .55/.65</cell></row><row><cell cols="2">TreeDict-Dep 250</cell><cell cols="2">.50/.56 .31/.30</cell><cell>-</cell><cell>.56/.63</cell><cell>-</cell><cell>.55/.61 .36/.41 .09/.12 .56/.63</cell></row><row><cell cols="4">TreeDict-Dep 250BL .43/.45 .30/.28</cell><cell>-</cell><cell>.56/.58</cell><cell>-</cell><cell>.52/.56 .34/.38 .09/.11 .55/.60</cell></row><row><cell cols="2">SkipThoughts-4800</cell><cell cols="4">.43/.23 .13/.13 .42/.40 .48/.51</cell><cell>.36/.37</cell><cell>.49/.49 .19/.25 .10/.15 .48/.50</cell></row><row><cell cols="2">SkipThoughts-1200</cell><cell cols="2">.55/.54 .22/.23</cell><cell>-</cell><cell>.55/.61</cell><cell>.39/.41</cell><cell>.56/.56 .21/.24 .09/.15 .53/.56</cell></row><row><cell cols="5">Model D.BoW D.RNN T.Seq T.Penn</cell><cell></cell><cell></cell></row><row><cell cols="5">D.BoW 1.0/1.0 .70/.71 .74/75 .80/.82</cell><cell></cell><cell></cell></row><row><cell cols="5">D.RNN .70/.71 1.0/1.0 .77/.75 .73/.72</cell><cell></cell><cell></cell></row><row><cell>T.Seq</cell><cell cols="4">.74/.75 .77/.75 1.0/1.0 .79/.78</cell><cell></cell><cell></cell></row><row><cell>T.Dep</cell><cell cols="4">.80/.82 .73/.72 .78/.78 1.0/1.0</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Spearman/Pearson correlations among the different models in the SICK dataset.</figDesc><table><row><cell>Model</cell><cell cols="2">F 1 -score</cell></row><row><cell></cell><cell cols="2">(5-label) (3-label)</cell></row><row><cell>DictRep-BoW</cell><cell>.40</cell><cell>.56</cell></row><row><cell>DictRep-RNN</cell><cell>.32</cell><cell>.49</cell></row><row><cell>TreeDict-Seq</cell><cell>.31</cell><cell>.49</cell></row><row><cell>TreeDict-Seq 250</cell><cell>.32</cell><cell>.48</cell></row><row><cell>TreeDict-Seq 250BL</cell><cell>.32</cell><cell>.49</cell></row><row><cell>TreeDict-Dep</cell><cell>.35</cell><cell>.53</cell></row><row><cell>TreeDict-Dep 250</cell><cell>.35</cell><cell>.51</cell></row><row><cell>TreeDict-Dep 250BL</cell><cell>.35</cell><cell>.53</cell></row><row><cell>SkipThoughts-4800</cell><cell>.40</cell><cell>.56</cell></row><row><cell>SkipThoughts-1200</cell><cell>.38</cell><cell>.55</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3 :</head><label>3</label><figDesc>Performance of the models in the polarity detection task</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">We use the same classification as in(Li et al.,  </note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2015" xml:id="foot_1">).</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_2">http://nlp.stanford.edu/sentiment/treebank.html</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_3">The character "-" indicates that some vectors for a sentence could not be obtained (e.g. due to a malformed dependency graph)</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work has been funded by the Spanish Ministerio de Economa y Competitividad through the project INRISCO (TEC2014-54335-C4-4-R).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Semeval-2014 task 10: Multilingual Semantic Textual Similarity</title>
		<author>
			<persName><forename type="first">Eneko</forename><surname>Agirre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Carmen</forename><surname>Banea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Claire</forename><surname>Cardie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Daniel</forename><surname>Cer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mona</forename><surname>Diab</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aitor</forename><surname>Gonzalez-Agirre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Weiwei</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rada</forename><surname>Mihalcea</surname></persName>
		</author>
		<author>
			<persName><forename type="first">German</forename><surname>Rigau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Janyce</forename><surname>Wiebe</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 8th international workshop on semantic evaluation (SemEval 2014)</title>
				<meeting>the 8th international workshop on semantic evaluation (SemEval 2014)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="81" to="91" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space</title>
		<author>
			<persName><forename type="first">Marco</forename><surname>Baroni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Roberto</forename><surname>Zamparelli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2010 Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="1183" to="1193" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Can recursive neural tensor networks learn logical reasoning</title>
		<author>
			<persName><surname>Samuel R Bowman</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1312.6192</idno>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Learning Phrase Representations using RNN Encoderdecoder for Statistical Machine Translation</title>
		<author>
			<persName><forename type="first">Kyunghyun</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bart</forename><surname>Van Merriënboer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Caglar</forename><surname>Gulcehre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dzmitry</forename><surname>Bahdanau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Fethi</forename><surname>Bougares</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Holger</forename><surname>Schwenk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1406.1078</idno>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Topics in Semantic Representation</title>
		<author>
			<persName><forename type="first">Mark</forename><surname>Thomas L Griffiths</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Joshua</forename><forename type="middle">B</forename><surname>Steyvers</surname></persName>
		</author>
		<author>
			<persName><surname>Tenenbaum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Psychological review</title>
		<imprint>
			<biblScope unit="volume">114</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page">211</biblScope>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Learning to Understand Phrases by Embedding the Dictionary</title>
		<author>
			<persName><forename type="first">Felix</forename><surname>Hill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kyunghyun</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Anna</forename><surname>Korhonen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Transactions of the Association for Computational Linguistics</title>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">Felix</forename><surname>Hill</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kyunghyun</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Anna</forename><surname>Korhonen</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1602.03483</idno>
		<title level="m">Learning Distributed Representations of Sentences from Unlabelled Data</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">A Neural Network for Factoid Question Answering over Paragraphs</title>
		<author>
			<persName><forename type="first">Mohit</forename><surname>Iyyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jordan</forename><forename type="middle">L</forename><surname>Boyd-Graber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Leonardo</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Max</forename><forename type="middle">Batista</forename><surname>Claudino</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Richard</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hal</forename><surname>Daumé</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Iii</forename></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">EMNLP</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="633" to="644" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Adam: A Method for Stochastic Optimization</title>
		<author>
			<persName><forename type="first">Diederik</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jimmy</forename><surname>Ba</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1412.6980</idno>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Skip-Thought Vectors</title>
		<author>
			<persName><forename type="first">Ryan</forename><surname>Kiros</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yukun</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Richard</forename><surname>Ruslan R Salakhutdinov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Raquel</forename><surname>Zemel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Antonio</forename><surname>Urtasun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sanja</forename><surname>Torralba</surname></persName>
		</author>
		<author>
			<persName><surname>Fidler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="3294" to="3302" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">Ankit</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jonathan</forename><surname>Ozan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">James</forename><surname>Su</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Robert</forename><surname>Bradbury</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Brian</forename><surname>English</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Peter</forename><surname>Pierce</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ishaan</forename><surname>Ondruska</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Richard</forename><surname>Gulrajani</surname></persName>
		</author>
		<author>
			<persName><surname>Socher</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Ask Me Anything: Dynamic Memory Networks for Natural Language Processing</title>
		<idno type="arXiv">arXiv:1506.07285</idno>
		<imprint/>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Distributed Representations of Sentences and Documents</title>
		<author>
			<persName><forename type="first">V</forename><surname>Quoc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tomas</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><surname>Mikolov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ICML</title>
		<imprint>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="page" from="1188" to="1196" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">Jiwei</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Minh-Thang</forename><surname>Luong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dan</forename><surname>Jurafsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Eudard</forename><surname>Hovy</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1503.00185</idno>
		<title level="m">When Are Tree Structures Necessary for Deep Learning of Representations</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">A SICK cure for the evaluation of compositional distributional semantic models</title>
		<author>
			<persName><forename type="first">Marco</forename><surname>Marelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Stefano</forename><surname>Menini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marco</forename><surname>Baroni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Luisa</forename><surname>Bentivogli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Raffaella</forename><surname>Bernardi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Roberto</forename><surname>Zamparelli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">LREC</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="216" to="223" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">Efficient Estimation of Word Representations in Vector Space</title>
		<author>
			<persName><forename type="first">Tomas</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kai</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Greg</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jeffrey</forename><surname>Dean</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1301.3781</idno>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Global belief recursive neural networks</title>
		<author>
			<persName><forename type="first">Romain</forename><surname>Paulus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Richard</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christopher</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="2888" to="2896" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Glove: Global Vectors for Word Representation</title>
		<author>
			<persName><forename type="first">Jeffrey</forename><surname>Pennington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Richard</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christopher</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">EMNLP</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="volume">14</biblScope>
			<biblScope unit="page" from="1532" to="1543" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<monogr>
		<ptr target="https://github.com/ryankiros/skip-thoughts.Ac-cessed:2017-07-07" />
		<title level="m">Sent2Vec encoder and training code from the paper</title>
				<imprint/>
	</monogr>
	<note>Skip-Thought Vectors</note>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection</title>
		<author>
			<persName><forename type="first">Richard</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Eric</forename><forename type="middle">H</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jeffrey</forename><surname>Pennin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christopher</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrew</forename><forename type="middle">Y</forename><surname>Ng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="801" to="809" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Semantic Compositionality through Recursive Matrix-vector Spaces</title>
		<author>
			<persName><forename type="first">Richard</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Brody</forename><surname>Huval</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christopher</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrew</forename><forename type="middle">Y</forename><surname>Ng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning</title>
				<meeting>the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="1201" to="1211" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Recursive Deep Models for Semantic Compositionality over a Sentiment Treebank</title>
		<author>
			<persName><forename type="first">Richard</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Alex</forename><surname>Perelygin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jean</forename><forename type="middle">Y</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jason</forename><surname>Chuang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christopher</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Andrew</forename><forename type="middle">Y</forename><surname>Ng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christopher</forename><surname>Potts</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the conference on empirical methods in natural language processing (EMNLP)</title>
				<meeting>the conference on empirical methods in natural language processing (EMNLP)</meeting>
		<imprint>
			<publisher>Citeseer</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="volume">1631</biblScope>
			<biblScope unit="page">1642</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<title level="m" type="main">Improved Semantic Representations from Tree-structured Long Short-term Memory Networks</title>
		<author>
			<persName><forename type="first">Kai</forename><surname>Sheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tai</forename></persName>
		</author>
		<author>
			<persName><forename type="first">Richard</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christopher</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note type="report_type">ACL</note>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<idno>arXiv e-prints, abs/1605.02688</idno>
		<title level="m">Theano: A Python framework for fast computation of mathematical expressions</title>
				<imprint>
			<date type="published" when="2016-05">2016. May</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<title level="m" type="main">Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books</title>
		<author>
			<persName><forename type="first">Yukun</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ryan</forename><surname>Kiros</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Richard</forename><surname>Zemel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ruslan</forename><surname>Salakhutdinov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Raquel</forename><surname>Urtasun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Antonio</forename><surname>Torralba</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sanja</forename><surname>Fidler</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1506.06724</idno>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
