<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Representation of Word Sentiment, Idioms and Senses</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Giuseppe</forename><surname>Attardi</surname></persName>
							<email>attardi@di.unipi.it</email>
							<affiliation key="aff0">
								<orgName type="department">Dipartimento di Informatica Università di Pisa Largo B. Pontecorvo</orgName>
								<address>
									<postCode>I-56127</postCode>
									<settlement>Pisa</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Representation of Word Sentiment, Idioms and Senses</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">F4BEA688279B9D3C514FE85124987A3E</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T20:46+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Distributional Semantic Models (DSM) that represent words as vectors of weights over a high dimensional feature space have proved very effective in representing semantic or syntactic word similarity. For certain tasks however it is important to represent contrasting aspects such as polarity, different senses or idiomatic use of words. We present two methods for creating embeddings that take into account such characteristics: a feed-forward neural network for learning sentiment specific and a skip-gram model for learning sense specific embeddings. Sense specific embeddings can be used to disambiguate queries and other classification tasks. We present an approach for recognizing idiomatic expressions by means of the embeddings. This can be used to segment queries into meaningful chunks. The implementation is available as a library implemented in Python with core numerical processing written in C++, using a parallel linear algebra library for efficiency and scalability.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Distributional Semantic Models (DSM) that represent words as vectors of weights over a high dimensional feature space <ref type="bibr" target="#b12">[13]</ref>, have proved very effective in representing semantic or syntactic aspects of lexicon. Incorporating such representations has allowed improving many natural language tasks. They also reduce the burden of feature selection since these models can be learned through unsupervised techniques from plain text. Deep learning algorithms for NLP tasks exploit distributional representation of words. In tagging applications such as POS tagging, NER tagging and Semantic Role Labeling (SRL), this has proved quite effective in reaching state of art accuracy and reducing reliance on manually engineered feature selection <ref type="bibr" target="#b7">[8]</ref>.</p><p>Word embeddings have been exploited also in constituency parsing <ref type="bibr" target="#b7">[8]</ref> and dependency parsing <ref type="bibr" target="#b3">[4]</ref>. Blanco et al. <ref type="bibr" target="#b2">[3]</ref> exploit word embeddings for identifying entities in web search queries. This paper presents DeepNL, an NLP pipeline based on a common Deep Learning architecture: it consists of tools for creating embeddings, and tools that exploit word embeddings as features. The current release includes a POS tagger, a NER, an SRL tagger and a dependency parser.</p><p>Two methods are supported for creating embeddings: an approach that uses neural network and one using Hellinger PCA <ref type="bibr" target="#b14">[15]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Building Word Embeddings</head><p>Word embeddings provide a low dimensional dense vector space representation for words, where values in each dimension may represent syntactic or semantic properties.</p><p>DeepNL provides two methods for building embeddings, one is based on the use of a neural language model, as proposed by <ref type="bibr" target="#b25">[26,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b17">18]</ref> and one based on spectral method as proposed by Lebret and Collobert <ref type="bibr" target="#b14">[15]</ref>.</p><p>The neural language method can be hard to train and the process is often quite time consuming, since several iterations are required over the whole training set. Some researcher provide precomputed embeddings for English<ref type="foot" target="#foot_0">1</ref> . The Polyglot project <ref type="bibr" target="#b0">[1]</ref> makes available embeddings for several languages, built from the plain text of Wikipedia in the respective language, and the Python code for computing them<ref type="foot" target="#foot_1">2</ref> , that supports GPU computations by means of Theano<ref type="foot" target="#foot_2">3</ref> .</p><p>Mikolov et al. <ref type="bibr" target="#b19">[20]</ref> developed an alternative solution for computing word embeddings, which significantly reduces the computational costs. They propose two loglinear models, called bag of words and skip-gram model. The bag-of-word approach is similar to a feed-forward neural network language model and learns to classify the current word in a given context, except that instead of concatenating the vectors of the words in the context window of each token, it just averages them, eliminating a network layer and reducing the data dimensions. The skip-gram model tries instead to estimate context words based on the current word. Further speed up in the computation is obtained by exploiting a mini-batch Asynchronous Stochastic Gradient Descent algorithm, splitting the training corpus into partitions and assigning them to multiple threads. An optimistic approach is also exploited to avoid synchronization costs: updates to the current weight matrix are performed concurrently, without any locking, assuming that updates to the same rows of the matrix will be infrequent and will not harm convergence.</p><p>The authors published single-machine multi-threaded C++ code for computing the word vectors <ref type="foot" target="#foot_3">4</ref> . A reimplementation of the algorithm in Python is included in the Genism library <ref type="bibr" target="#b22">[23]</ref>. In order to obtain comparable speed to the C++ version, they use Cython for interfacing to a coding in C of the core function for training the network on a single sentence, which in turn exploits the BLAS library for algebraic computations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2.1</head><p>Word Embeddings through Hellinger PCA <ref type="bibr">Lebret and Collobert [15]</ref> have shown that embeddings can be efficiently computed from word co-occurrence counts, applying Principal Component Analysis (PCA) to reduce dimensionality while optimizing the Hellinger similarity distance.</p><p>Levy and Goldberg <ref type="bibr" target="#b15">[16]</ref> have shown similarly that the skip-gram model by Mikolov et al. <ref type="bibr" target="#b19">[20]</ref> can be interpreted as implicitly factorizing a word-context matrix, whose values are the pointwise mutual information (PMI) of the respective word and context pairs, shifted by a global constant.</p><p>DeepNL provides an implementation of the Hellinger PCA algorithm using Cython and the LAPACK library SYSEVR.</p><p>Co-occurrence frequencies are computed by counting the number of times each context word w  D occurs after a sequence of T words:</p><formula xml:id="formula_0">𝑝(𝑤|𝑇) = 𝑝(𝑤, 𝑇) 𝑝(𝑇) = 𝑛(𝑤, 𝑇) ∑ 𝑛(𝑤, 𝑇) 𝑛</formula><p>where n(w, T) is the number of times word w occurs after a sequence of T words. The set D of context word is normally chosen as the subset of the top most frequent words in the vocabulary V.</p><p>The word co-occurrence matrix C of size |V||D| is built. The coefficients of C are square rooted and then its transpose is multiplied by it to obtain a symmetric square matrix of size |V||V|, to which PCA is applied for obtaining the desired dimensionality reduction.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Context Sensitive Word Embeddings</head><p>The meaning of words often depends on their context. Our approach for learning word embeddings in context is inspired by the method for learning paragraph vectors <ref type="bibr" target="#b13">[14]</ref>. We improve on their approach, avoiding the cost of computing at query time an embedding for the paragraph of the query. Our solution bears some resemblance to the approach in <ref type="bibr" target="#b11">[12]</ref>. We add padding at sentence boundaries and substitute &lt;UNK&gt; for OOV words.</p><p>The prediction task is performed by a neural network with a softmax layer:</p><formula xml:id="formula_1">𝑝(𝑤 𝑡 |𝑤 𝑡−𝑘 , … , 𝑤 𝑡+𝑘 ) = 𝑒 𝑦 𝑖 ∑ 𝑒 𝑦 𝑖 𝑖</formula><p>Each y i is the score for output word i, computed as: 𝑦 = 𝑏 + 𝑈ℎ(𝑤 𝑡−𝑘 , … , 𝑤 𝑡+𝑘 ; 𝑊, 𝐷) where b, U are network parameters, W and D are the weight matrixes for words and paragraphs respectively. h concatenates the vectors of each word extracted from W and of their sum multiplied by D:</p><formula xml:id="formula_2">ℎ(𝑖 1 , … , 𝑖 𝑛 ; 𝑊, 𝐷) = 𝑊 1 || … || 𝑊 𝑛 ||𝐷 ∑ 𝑊 𝑖 𝑛 𝑖</formula><p>The combination of word vector and paragraph vector can be used for word sense disambiguation. A word i k within paragraph i 1 ,…,i n is represented by the concatenation</p><formula xml:id="formula_3">𝑊 𝑖 𝑘 ||𝐷 ∑ 𝑊 𝑖 𝑛 𝑖</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Sentiment Specific Word Embeddings</head><p>For the task of sentiment analysis, semantic similarity is not appropriate, since antonyms end up at close distance in the embeddings space. One needs to learn a vector representation where words of opposite polarity are far apart. Tang et al. <ref type="bibr" target="#b24">[25]</ref> propose an approach for learning sentiment specific word embeddings, by incorporating supervised knowledge of polarity in the loss function of the learning algorithm. The original hinge loss function in the algorithm by Collobert et al. <ref type="bibr" target="#b5">[6]</ref> is:</p><formula xml:id="formula_4">L CW (x, x c ) = max(0, 1  f  (x) + f  (x c ))</formula><p>where x is an ngram and x c is the same ngram corrupted by changing the target word with a randomly chosen one, f  (•) is the feature function computed by the neural network with parameters θ. The sentiment specific network outputs a vector of two dimensions, one for modeling the generic syntactic/semantic aspects of words and the second for modeling polarity.</p><p>A second loss function is introduced as objective for minimization:</p><formula xml:id="formula_5">L SS (x, x c ) = max(0, 1   s (x) f  (x) 1 +  s (x) f  (x c ) 1 )</formula><p>where the subscript in f  (x) 1 refers to the second element of the vector and  s (x) is an indicator function reflecting the sentiment polarity of a sentence, whose value is 1 if the sentiment polarity of x is positive and -1 if it is negative. The overall hinge loss is a linear combination of the two:</p><formula xml:id="formula_6">L(x, x c ) = L CW (x, x c ) + (1 -) L SS (x, x c )</formula><p>DeepNL provides an algorithm for training polarized embeddings, performing gradient descent using an adaptive learning rate according to the AdaGrad method. The algorithm requires a training set consisting of sentences annotated with their polarity, for example a corpus of tweets. The algorithm builds embeddings for both unigrams and ngrams at the same time, by performing variations on a training sentence replacing not just a single word, but a sequence of words with either another word or another ngram.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Deep Learning Architecture</head><p>DeepNL adopts a multi-layer neural network architecture, as proposed in <ref type="bibr" target="#b5">[6]</ref>, consisting of five layers: a lookup layer, a linear layer, an activation layer (e.g. hardtanh), a second linear layer and a softmax layer. Overall, the network computes the following function:</p><formula xml:id="formula_7">f(x) = softmax(M 2 a(M 1 x + b 1 ) + b 2 )</formula><p>where </p><formula xml:id="formula_8">M 1  R hd , b 1  R d , M 2  R oh , b 2  R o ,</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Lookup layer</head><p>The first layer of the network transforms the input into a feature vector representation. Individual words are represented by a vector of features, which is trained by backpropagation.</p><p>For each word w  D, an internal d-dimensional feature vector representation is given by the lookup table layer LTW(•):</p><formula xml:id="formula_9">𝐿𝑇 𝑊 (𝑤) = 〈𝑊〉 𝑤 1</formula><p>where 𝑊 ∈ ℝ 𝑑×|𝒟| is a matrix of parameters to be learned, 〈𝑊〉 𝑤 1 ∈ 𝒟 is the w th column of W and d is the word vector size (a hyper-parameter to be chosen by the user).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Discrete Features</head><p>Besides word representations, a number of discrete features can be used. Each feature has its own lookup table 𝐿𝑇 𝑊 𝑘 (•) with parameters 𝑊 𝑘 ∈ ℝ 𝑑 𝑘 ×|𝒟 𝑘 | , where D k is the dictionary for the k-th feature and d k is a user specified vector size. The input to the network becomes the concatenation of the vectors for all features:</p><p>𝐿𝑇 𝑊 1 (𝑤)𝐿𝑇 𝑊 2 (𝑤) ⋯ 𝐿𝑇 𝑊 𝐾 (𝑤)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Sequence Taggers</head><p>For sequence tagging, two approaches were proposed in <ref type="bibr" target="#b5">[6]</ref>, a window approach and a sentence approach. The window approach assumes that the tag of a word depends mainly on the neighboring words, and is suitable for tasks like POS and NE tagging.</p><p>The sentence approach assumes that the whole sentence must be taken into account by adding a convolution layer after the first lookup layer and is more suitable for tasks like SRL. We can train a neural network to maximize the log-likelihood over the training data. Denoting by  the trainable parameters of the network, we want to maximize the following log-likelihood with respect to :</p><formula xml:id="formula_10">∑ log 𝑝(𝑡 𝑖 |𝑐 𝑖 , 𝜃) 𝑖</formula><p>The score s(w, t, ) of a sequence of tags t for a sentence w, with parameters , is given by the sum of the transition scores and the tree scores: 𝑠(𝑥, 𝑡, 𝜃) = ∑(𝑇(𝑡 𝑖−1 , 𝑡 𝑖 ) + 𝑓 𝜃 (𝑥 𝑖 , 𝑡 𝑖 ))</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>𝑛 𝑖=1</head><p>where T(i, j) is the score for the transition from tag i to tag j, and f  (x i , t i ) is the output of the network at word x i with tag t,. The probability of a sequence y for sentence x can be expressed as: 𝑝(𝑦|𝑥, 𝜃) = 𝑒 𝑠(𝑥,𝑦,𝜃) ∑ 𝑒 𝑠(𝑥,𝑡,𝜃) 𝑡</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">Experiments</head><p>We tested the DeepNL sequence tagger on the CoNLL 2003 challenge <ref type="foot" target="#foot_4">5</ref> , a NER benchmark based on Reuters data. The tagger was trained with three types of features: word embeddings from SENNA, a "caps" feature telling whether a word is in lowercase, uppercase, title case, or had at least one non-initial capital letter, and a gazetteer feature, based on the list provided by the organizers. The window size was set to 5, 300 hidden variables were used and training was iterated for 40 epochs. In the following table we report the scores compared with the system by Ando et al. <ref type="bibr" target="#b1">[2]</ref> which uses a semi-supervised approach and with the results by the released version of SENNA The slight difference with SENNA is possibly due to the use of different suffixes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Software Architecture</head><p>The DeepNL implementation is written in Cython and uses C++ code which exploits the Eigen<ref type="foot" target="#foot_6">7</ref> library for efficient parallel linear algebra computations. Data is exchanged between Numpy arrays in Python and Eigen matrices by means of Eigen Map types. On the Cython side, a pointer to the location where the data of a Numpy array is stored is obtained with a call like: &lt;FLOAT_t*&gt;np.PyArray_DATA(self.nn.hidden_weights)</p><p>and passed to a C++ method. On the C++ side this is turned into an Eigen matrix, with no computational costs due to conversion or allocation, with the code:</p><p>Map&lt;Matrix&gt; hidden_weights(hidden_weights, numHidden, numInput)</p><p>which interprets the pointer to a double as a matrix with numHidden rows and nu-mInput columns.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Feature Extractors</head><p>The library has a modular architecture that allows customizing a network for specific tasks, in particular its first layer, by supplying extractors for various types of features. An extractor is defined as a class that inherits from an abstract class with the following interface: class Extractor(object): def extract(self, tokens) def lookup(self, feature) def save(self, file) def load(self, file)</p><p>Method extract, applied to a list of tokens, extracts features from each token and returns a list of IDs for those features. Method lookup returns the vector of weights for a given feature. Methods save/load allow saving and reloading the Extractor data to/from disk. Extractors currently include an Embeddings extractor, implementing the word lookup feature, a Caps, Prefix and Postfix extractor for dealing with capitaliza-tion and prefix/postfix features, a Gazetteer extractor for dealing with the gazetteers typically used in a NER, and a customizable AttributeFeature extractor that extracts features from the state of a Shift/Reduce dependency parser, i.e. from the tokens in the stack or buffer as described for example in <ref type="bibr" target="#b20">[21]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Parallel gradient computation</head><p>The computation of the gradients during network training requires computing the conditional probability over all possible sequences of tags, which grow exponentially with the length of the sequence. They can however be computed in linear time by accumulating them in a matrix, and then the matrix computation can be parallelized, as in the following code:</p><formula xml:id="formula_11">delta = scores delta[0] += transitions[-1] tr = transitions[:-1] for i in xrange(1, len(delta)): # sum by columns logadd = logsumexp(delta[i-1][:,newaxis] + tr, 0) delta[token] += logadd</formula><p>The array scores[i, j] contains the output of the neural network for the i-th element of the sequence and for tag j, delta[i, j] represents the sum of all scores ending at the i-th token with tag j; transitions[i, j] contains the current estimate of the probability of a transition from tag i to tag j. The computation can be optimized and parallelized using suitable linear algebra libraries. We implemented two versions of the network trainer, one in Python using NumPy <ref type="foot" target="#foot_7">8</ref> and one in C++ using Eigen <ref type="foot" target="#foot_8">9</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Identification of Idiomatic Multiword Expressions</head><p>As an application of word embeddings, we experiment on the identification of idiomatic multiword expressions. Multiword expressions are combinations of two or more words which can be syntactically and/or semantically idiosyncratic in nature. There are many varieties of multiword expressions: we concentrate on non-decomposable idioms, i.e. those idioms in which the meaning cannot be assigned to the parts of the MWE.</p><p>MWE identification is typically split into two phases: candidate identification and filtering.</p><p>For identifying potential candidates for MWE one can exploit the technique for discovering collocations <ref type="bibr" target="#b4">[5]</ref> based on Pointwise Mutual Information.</p><p>We adopt the simple variant proposed by <ref type="bibr">Mikolov et al. (2013a)</ref>, of computing a score for the likelihood of forming a collocation, using the unigram and bigram counts:</p><formula xml:id="formula_12">𝑠𝑐𝑜𝑟𝑒(𝑤 𝑖 , 𝑤 𝑗 ) = 𝑐𝑜𝑢𝑛𝑡(𝑤 𝑖 , 𝑤 𝑗 ) − 𝛿 𝑐𝑜𝑢𝑛𝑡(𝑤 𝑖 ) • 𝑐𝑜𝑢𝑛𝑡(𝑤 𝑗 )</formula><p>The bigrams with score above a chosen threshold are then used as phrases. The  is a discounting coefficient and prevents generating too many phrases consisting of very infrequent words. We also apply a cutoff on the frequency of bigrams, to avoid depending too much on the frequency of the individual words and in particular to limit the tendency of assigning higher scores to lower frequency words.</p><p>The process is repeated a few times by replacing the bigrams with a single token and decreasing the threshold value, in order to extract longer phrases.</p><p>As many have noted <ref type="bibr" target="#b16">[17]</ref>, just relying on statistical measures of frequency for identifying MWEs does not achieve very satisfactory results, since idiomatic phrases are not that much frequent in texts, hence data is sparse; therefore some sort of semantic knowledge is required.</p><p>Srivastava and Hovy <ref type="bibr" target="#b23">[24]</ref> introduce a segmentation model for partitioning a sentence into linear constituents, called motifs, which is learned though semi-supervised learning. They then build embeddings for such motifs using the Hellinger PCA technique of Lebret and Collobert <ref type="bibr" target="#b14">[15]</ref>.</p><p>For deciding whether a candidate collocation is indeed a phraseme, we rely on the distinctive properties of idiomatic expressions: non-composability, i.e. their meaning is not obtainable as a composition of the meaning of its part; non-substitutivity, i.e. replacing near-synonyms for the parts of a phrase would produce something weird or nonsensical.</p><p>We assume that these two aspects should be fairly evident to the reader, otherwise he would not be able to distinguish a phraseme from a normal phrasal combination. Therefore, if we replace some of the words in the expression with similar words, we should end up with an apparently weird combination.</p><p>The basic idea in our experiments is to select replacement words that are similar according to their distance in the word embedding space. As a criterion for deciding if a phrase is unusual, we check first if no variant occurs in the corpus, otherwise we check whether the LM probability of all variants is below a given threshold.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.1">Experiments</head><p>We carried out experiments on the corpus consisting of the plain text extracted from the English Wikipedia, for a total of 1,096,243,235 tokens, 4,456,972 distinct. We created word embeddings on the corpus obtained by performing token combinations as described above, using a threshold of 500 on the first iteration and 300 on the following ones. We used a cutoff of 80 on the first iteration and 40 on the following ones. The vocabulary for this corpus consists of 225,000 words or phrases.</p><p>For evaluating our model, we used the WikiMwe corpus <ref type="bibr" target="#b9">[10]</ref>, which includes a gold evaluation set consisting of 2,500 expressions, annotated in four categories: noncompositional, collocation, regular natural language phrase and ungrammatical. Table <ref type="table" target="#tab_1">2</ref> shows the results of our experiments. An online demo of a similar system for the identification of Italian idiomatic phrases is available at: http://tanl.di.unipi.it/embeddings/mwe. A potential application of the technique is the identification of chunks in search queries or in AdWords queries, in order to recognize expression whose intended meaning does not correspond to the combination of the individual words in the query.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Type</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusions</head><p>We have presented the architecture of DeepNL, a library for building NLP applications based on a deep learning architecture. The toolkit includes various methods for creating embedding, either generic embeddings and sentiment specific or context sensitive embeddings.</p><p>As an example of the effectiveness of the embeddings, we have explored their use in the identification of idiomatic word expressions.</p><p>The implementation is written in Python/Cython and uses C++ linear algebra libraries for efficiency and scalability, exploiting multithreading or GPUs where available. The code is available for download from: https://github.com/attardi/deepnl.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Table 1. Results on the identification of idiomatic expressions.</head><p>There are several potential applications for the library, in particular sentiment specific word embeddings might be applied to other classification tasks, for example detecting tweets that signal dangers or disasters.</p><p>Context sensitive word embeddings can be exploited in artificial tasks like word sense disambiguation or word sense similarity. Hopefully they should provide also benefits for more relevant tasks such as relation extraction, negation identification, data linking, and ontology creation. Context-aware embeddings have been indeed applied effectively to matching ad words to queries by Grbovic et al. <ref type="bibr" target="#b9">[10]</ref>.</p><p>We hope that the availability of the code will encourage exploring their use in further applications.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 .score</head><label>1</label><figDesc>Figure 1. Overview of the model for context sensitive word embeddings. U and Dare the matrices of weights to be learned.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>are the parameters, with d the dimension of the input, h the number of hidden units, o the number of output classes, a() is the activation function.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>The results are encouraging since the best level of accuracy reported in Vincze et al.<ref type="bibr" target="#b26">[27]</ref> is 55.75% F1 for noun compounds.Table2shows a few examples of the output of our system on the WikiMwe test set. Samples of phrases and types assigned by our system.</figDesc><table><row><cell></cell><cell cols="2">Precision</cell><cell>Recall</cell><cell>F1</cell></row><row><cell>MWE</cell><cell cols="2">53.61</cell><cell>58.73</cell><cell>55.05</cell></row><row><cell>Regular</cell><cell cols="2">51.36</cell><cell>66.60</cell><cell>58.00</cell></row><row><cell>Ungrammatical</cell><cell></cell><cell>5.48</cell><cell>40.00</cell><cell>9.64</cell></row><row><cell>Phrase</cell><cell>ngrams</cell><cell cols="2">LM prob.</cell><cell>type</cell><cell>Correct</cell></row><row><cell>dual gauge</cell><cell>232</cell><cell cols="2">-2.1</cell><cell>MWE</cell><cell>yes</cell></row><row><cell>art of being</cell><cell>0</cell><cell cols="2">-2.9</cell><cell>MWE</cell><cell>yes</cell></row><row><cell>protest against the war</cell><cell>0</cell><cell cols="2">-2.0</cell><cell>Colloc</cell><cell>yes</cell></row><row><cell>way to Damascus</cell><cell>0</cell><cell cols="2">-3.7</cell><cell>Colloc</cell><cell>yes</cell></row><row><cell>financial services</cell><cell>0</cell><cell cols="2">-1.8</cell><cell>Colloc.</cell><cell>no</cell></row><row><cell>androgenic alopecia</cell><cell>0</cell><cell>0.0</cell><cell></cell><cell>MWE</cell><cell>yes</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://ronan.collobert.com/senna/, http://metaoptimize.com/projects/wordreprs/, http://www.fit.vutbr.cz/˜imikolov/rnnlm/, http://ai.stanford.edu/˜ehhuang/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://bitbucket.org/aboSamoor/word2embeddings</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">http://deeplearning.net/software/theano/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">https://code.google.com/p/word2vec</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">http://www.cnts.ua.ac.be/conll2003/ner/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">http://ml.nec-labs.com/senna/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">http://eigen.tuxfamily.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_7">http://www.numpy.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_8">http://eigen.tuxfamily.org/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Acknowledgements. Partial support for this work was provided by project RIS (POR RIS of the Regione Toscana, CUP n° 6408.30122011.026000160).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Polyglot: Distributed Word Representations for Multilingual NLP</title>
		<author>
			<persName><forename type="first">R</forename><surname>Al-Rfou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Perozzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Skiena</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1307.1662</idno>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">A framework for learning predictive structures from multiple tasks and unlabeled data</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">K</forename><surname>Ando</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Bartlett</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="1817" to="1853" />
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Fast and Space-efficient Entity Linking in Queries</title>
		<author>
			<persName><forename type="first">Roi</forename><surname>Blanco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Giuseppe</forename><surname>Ottaviano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Edgar</forename><surname>Meij</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2015">2015. 2015</date>
			<publisher>ACM WSDM</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Fast and Accurate Dependency Parser using Neural Networks</title>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of EMNLP</title>
				<meeting>of EMNLP</meeting>
		<imprint>
			<date type="published" when="2014">2014. 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Word association norms, mutual information, and lexicography</title>
		<author>
			<persName><forename type="first">K</forename><surname>Church</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Hanks</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="22" to="29" />
			<date type="published" when="1990">1990</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Natural Language Processing (Almost) from Scratch</title>
		<author>
			<persName><forename type="first">R</forename><surname>Collobert</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="2461" to="2505" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">A unified architecture for natural language processing: Deep neural networks with multitask learning</title>
		<author>
			<persName><forename type="first">R</forename><surname>Collobert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Weston</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2008">2008. 2008</date>
			<publisher>ICML</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Deep Learning for Efficient Discriminative Parsing</title>
		<author>
			<persName><forename type="first">R</forename><surname>Collobert</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">AISTATS</title>
				<imprint>
			<date type="published" when="2011">2011. 2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Multiview learning of word embeddings via CCA</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">S</forename><surname>Dhillon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Foster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ungar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems (NIPS)</title>
				<imprint>
			<date type="published" when="2011">2011</date>
			<biblScope unit="volume">24</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Context-and Content-aware Embeddings for Query Rewriting in Sponsored Search</title>
		<author>
			<persName><forename type="first">M</forename><surname>Grbovic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Djuric</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Radosavljevic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Silvestri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Bhamidipati</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of SIGIR 2015</title>
				<meeting>SIGIR 2015<address><addrLine>Santiago, Chile</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Mining Multiword Terms from Wikipedia</title>
		<author>
			<persName><forename type="first">S</forename><surname>Hartmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Szarvas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Gurevych</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Semi-Automatic Ontology Development: Processes and Resources</title>
				<editor>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Pazienza</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Stellato</surname></persName>
		</editor>
		<meeting><address><addrLine>Hershey, PA, USA</addrLine></address></meeting>
		<imprint>
			<publisher>IGI Global</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="226" to="258" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Improving Word Representations via Global Context and Multiple Word Prototypes</title>
		<author>
			<persName><surname>Huang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the Association for Computational Linguistics 2012 Conference</title>
				<meeting>of the Association for Computational Linguistics 2012 Conference</meeting>
		<imprint>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Distributed representations</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">E</forename><surname>Hinton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">L</forename><surname>Mcclelland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">E</forename><surname>Rumelhart</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Parallel distributed processing: Explorations in the microstructure of cognition</title>
				<imprint>
			<publisher>MIT Press</publisher>
			<date type="published" when="1986">1986. 1986</date>
			<biblScope unit="volume">1</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Distributed Representations of Sentences and Documents</title>
		<author>
			<persName><forename type="first">Quoc</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tomas</forename><surname>Mikolov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 31st International Conference on Machine Learning</title>
				<meeting>the 31st International Conference on Machine Learning<address><addrLine>Beijing, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014. 2014</date>
			<biblScope unit="volume">32</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Word Embeddings through Hellinger PCA</title>
		<author>
			<persName><forename type="first">Rémi</forename><surname>Lebret</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ronan</forename><surname>Collobert</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of EACL</title>
				<meeting>of EACL</meeting>
		<imprint>
			<date type="published" when="2013">2013. 2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Neural Word Embeddings as Implicit Matrix Factorization</title>
		<author>
			<persName><forename type="first">Omer</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoav</forename><surname>Goldberg</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems (NIPS)</title>
				<imprint>
			<date type="published" when="2014">2014. 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Foundations of Statistical Natural Language Processing</title>
		<author>
			<persName><forename type="first">Christopher</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hinrich</forename><surname>Schütze</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1999">1999</date>
			<publisher>The MIT Press</publisher>
			<pubPlace>Cambridge, Massachusetts</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Recurrent neural network based language model</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Karafiat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Burget</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Cernocky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sanjeev</forename><surname>Khudanpur</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">INTERSPEECH 2010, 11th Annual Conference of the International Speech Communication Association</title>
				<meeting><address><addrLine>Makuhari, Chiba, Japanfmikol</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Efficient Estimation of Word Representations in Vector Space</title>
		<author>
			<persName><forename type="first">Tomas</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kai</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Greg</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jeffrey</forename><surname>Dean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of Workshop at ICLR</title>
				<meeting>Workshop at ICLR</meeting>
		<imprint>
			<date type="published" when="2013">2013. 2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Distributed Representations of Words and Phrases and their Compositionality</title>
		<author>
			<persName><forename type="first">Tomas</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ilya</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kai</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Greg</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jeffrey</forename><surname>Dean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of NIPS</title>
				<meeting>NIPS</meeting>
		<imprint>
			<date type="published" when="2013">2013. 2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Algorithms for deterministic incremental dependency parsing</title>
		<author>
			<persName><forename type="first">Joakim</forename><surname>Nivre</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="issue">4</biblScope>
			<biblScope unit="page" from="513" to="553" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Multiword expressions in the wild? the mwetoolkit comes in handy</title>
		<author>
			<persName><forename type="first">Carlos</forename><surname>Ramisch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aline</forename><surname>Villavicencio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Christian</forename><surname>Boitet</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 23rd COLING (COLING 2010) -Demonstrations</title>
				<editor>
			<persName><forename type="first">Yang</forename><surname>Liu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">Ting</forename><surname>Liu</surname></persName>
		</editor>
		<meeting>of the 23rd COLING (COLING 2010) -Demonstrations<address><addrLine>Beijing, China</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="57" to="60" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Software Framework for Topic Modelling with Large Corpora</title>
		<author>
			<persName><forename type="first">Radim</forename><surname>Řehůřek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Petr</forename><surname>Sojka</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks</title>
				<meeting>the LREC 2010 Workshop on New Challenges for NLP Frameworks<address><addrLine>Valletta, Malta</addrLine></address></meeting>
		<imprint>
			<publisher>ELRA</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="45" to="50" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Vector space semantics with frequency-driven motifs</title>
		<author>
			<persName><forename type="first">S</forename><surname>Srivastava</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Hovy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics</title>
				<meeting>the 52nd Annual Meeting of the Association for Computational Linguistics<address><addrLine>Baltimore, Maryland, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="634" to="643" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Learning Sentiment-SpecificWord Embedding for Twitter Sentiment Classification</title>
		<author>
			<persName><surname>Tang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics</title>
				<meeting>the 52nd Annual Meeting of the Association for Computational Linguistics<address><addrLine>Baltimore, Maryland, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014-06-23">2014. June 23-25 2014</date>
			<biblScope unit="page" from="1555" to="1565" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Word representations: a simple and general method for semi-supervised learning</title>
		<author>
			<persName><forename type="first">Joseph</forename><surname>Turian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lev</forename><surname>Ratinov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 48th annual meeting of the association for computational linguistics</title>
				<meeting>the 48th annual meeting of the association for computational linguistics</meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="384" to="394" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Detecting noun compounds and light verb constructions: a contrastive study</title>
		<author>
			<persName><forename type="first">Veronika</forename><surname>Vincze</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">István</forename><surname>Nagy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Gábor</forename><surname>Berend</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World (MWE &apos;11)</title>
				<meeting>the Workshop on Multiword Expressions: from Parsing and Generation to the Real World (MWE &apos;11)<address><addrLine>Stroudsburg, PA, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2011">2011</date>
			<biblScope unit="page" from="116" to="121" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
