<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">In Search for Linear Relations in Sentence Embedding Spaces</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Petra</forename><surname>Barančíková</surname></persName>
							<email>barancikova@ufal.mff.cuni.cz</email>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics</orgName>
								<orgName type="institution">Charles University</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ondřej</forename><surname>Bojar</surname></persName>
							<email>bojar@ufal.mff.cuni.cz</email>
							<affiliation key="aff0">
								<orgName type="department">Faculty of Mathematics and Physics Institute of Formal and Applied Linguistics</orgName>
								<orgName type="institution">Charles University</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">In Search for Linear Relations in Sentence Embedding Spaces</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">49F04232B1CF9624C058365FD4627248</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T23:35+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We present an introductory investigation into continuous-space vector representations of sentences. We acquire pairs of very similar sentences differing only by a small alterations (such as change of a noun, adding an adjective, noun or punctuation) from datasets for natural language inference using a simple pattern method. We look into how such a small change within the sentence text affects its representation in the continuous space and how such alterations are reflected by some of the popular sentence embedding models. We found that vector differences of some embeddings actually reflect small changes within a sentence.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Continuous-space representations of sentences, so-called sentence embeddings, are becoming an interesting object of study, consider e.g. the BlackBox workshop. 1 Representing sentences in a continuous space, i.e. commonly with a long vector of real numbers, can be useful in multiple ways, analogous to continuous word representations (word embeddings). Word embeddings have provably made downstream processing robust to unimportant input variations or minor errors (sometimes incl. typos), they have greatly boosted the performance of many tasks in low data conditions and can form the basis of empiricallydriven lexicographic explanations of word meanings.</p><p>One notable observation was made in <ref type="bibr" target="#b14">[15]</ref>, showing that several interesting relations between words have their immediate geometric counterpart in the continuous vector space.</p><p>Our aim is to examine existing continuous representations of whole sentences, looking for an analogous behaviour. The idea of what we are hoping for is illustrated in Figure <ref type="figure">1</ref>. As with words, we would like to learn if and to what extent some simple geometric operations in the continuous space correspond to simple semantic operations on the sentence strings. Similarly to <ref type="bibr" target="#b14">[15]</ref>, we are deliberately not including this aspect in the training objective of the sentence presentations but instead search for properties that are learned in an unsupervised way, as a sideeffect of the original training objective, data and setup.</p><p>Figure <ref type="figure">1</ref>: An illustration of a continuous multidimensional vector space representing individual sentences, a 'space of sentences' (upper plot) where each sentence is represented as a dot. Pairs of related sentences are connected with arrows; dashing indicates various relation types. The lower plot illustrates a possible 'space of operations' (here vector difference, so all arrows are simply moved to start at a common origin). The hope is that similar operations (e.g. all vector transformations extracted from sentence pairs differing in the speed of travel "running instead of walking") would be represented close to each other in the space of operations, i.e. form a more or less compact cluster.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A little boy is walking.</head><p>A sad boy is walking.</p><p>A little boy is running.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Look at my little cat!</head><p>Look how sad my cat is.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A dog is walking past a field.</head><p>There is a dog running past the field.</p><p>A man is walking in the field. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Space of Sentences</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Space of Operations</head><p>This approach has the potential of explaining the good or bad performance of the examined types of representations in various tasks. The paper is structured as follows: Section 2 reviews the closest related work. Sections 3 and 4, respectively, describe the dataset of sentences and the sentence embeddings methods we use. Section 5 presents the selection of operations on the sentence vectors. Section 6 provides the main experimental results of our work. We conclude in Section 7.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>Series of tests to measure how well their word embeddings capture semantic and syntactic information is defined in <ref type="bibr" target="#b14">[15]</ref>. These tests include for example declination of adjectives ("easy"→"easier"→"easiest"), chang-Figure <ref type="figure">2</ref>: Example of our pattern extraction method. In the first step, the longest common subsequence of tokens (ear is playing a guitar .) is found and replaced with the variable X. In the second step, with a tattoo behind is substituted with the variable Y. As the variables are not listed alphabetically in the premise, they are switched in the last step.</p><p>step premise hypothesis 1. a man with a tattoo behind his ear is playing a guitar . a woman with a tattoo behind her ear is playing a guitar . ing the tense of a verb ("walking"→"walk") or getting the capital ("Athens"→"Greece") or currency of a state ("Angola"→"kwanza"). References [2; 13] have further refined the support of sub-word units, leading to considerable improvements in representing morpho-syntactic properties of words. Vylomova, Rimmel, Cohn and Baldwin <ref type="bibr" target="#b25">[26]</ref> largely extended the set of considered semantic relations of words.</p><p>Sentence embeddings are most commonly evaluated extrinsically in so called 'transfer tasks', i.e. comparing the evaluated representations based on their performance in sentence sentiment analysis, question type prediction, natural language inference and other assignments. Reference <ref type="bibr" target="#b7">[8]</ref> introduce 'probing tasks' for intrinsic evaluation of sentence embeddings. They measure to what extent linguistic features like sentence length, word order, or the depth of the syntactic tree are available in a sentence embedding. This work was extended to SentEval <ref type="bibr" target="#b5">[6]</ref>, a toolkit for evaluating the quality of sentence embedding both intrinsically and extrinsically. It contains 17 transfer tasks and 10 probing tasks. SentEval is applied to many recent sentence embedding techniques showing that no method had a consistently good performance across all tasks <ref type="bibr" target="#b17">[18]</ref>.</p><p>Voleti, Liss and Berisha <ref type="bibr" target="#b24">[25]</ref> examine how errors (such as incorrect word substitution caused by automatic speech recognition) in a sentence affect its embedding. The embeddings of corrupted sentences are then used in textual similarity tasks and the performance is compared with original embedding. The results suggest that pretrained neural sentence encoders are much more robust to introduced errors contrary to bag-of-words embeddings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Examined Sentences</head><p>Because manual creation of sentence variations is costly, we reuse existing data from SNLI <ref type="bibr" target="#b2">[3]</ref> and MultiNLI <ref type="bibr" target="#b26">[27]</ref>. Both these collections consist of pairs of sentences-a premise and a hypothesis-and their relationship (entailment/contradiction/neutral). The two datasets together contain 982k unique sentence pairs. All sentences were lowercased and tokenized using NLTK <ref type="bibr" target="#b13">[14]</ref>.</p><p>From all the available sentence pairs, we select only a subset where the difference between the sentences in the pair can be described with a simple pattern. Our method goes as follows: given two sentences, a premise p and the corresponding hypothesis h, we find the longest common substring consisting of whole words and replace it with a variable. This is repeated once more, so our sentence patterns can have up to two variables. In the last step, we make sure the pattern is in a canonical form by switching the variables to ensure they are alphabetically sorted in p. The process is illustrated in Figure <ref type="figure">2</ref>.</p><p>Ten most common patterns for each NLI relation are shown in Figure <ref type="figure">3</ref>. Many of the obtained patterns clearly match the sentence pair label. For instance the pattern no. 2 ("X man Y → X person Y") can be expected to lead to a sentence pair illustrating entailment. If a man appears in a story, we can infer that a person appeared in the story. The contradictions illustrate typical oppositions like manwoman, dog-cat. Neutrals are various refinements of the content described by the sentences, probably in part due to the original instruction in SNLI that hypothesis "might be a true" given the premise in neutral relation.</p><p>We kept only patterns appearing with at least 20 different sentence pairs in order to have large and variable sets of sentence pairs in subsequent experiments. We also ignored the overall most common pattern, namely the identity, because it actually does not alter the sentence at all. Strangely enough, identity was observed not just among entailment pairs (693 cases), but also in neutral (41 cases) and contradiction <ref type="bibr" target="#b21">(22)</ref> pairs.</p><p>Altogether, we collected 4,2k unique sentence pairs in 60 patterns. Only 10% of this data comes from MultiNLI, the majority is from SNLI.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Sentence Embeddings</head><p>We experiment with several popular pretrained sentence embeddings.</p><p>InferSent<ref type="foot" target="#foot_0">2</ref>  <ref type="bibr" target="#b6">[7]</ref> is the first embedding model that used a supervised learning to compute sentence representations. It was trained to predict inference labels on the SNLI dataset. The authors tested 7 different architectures and BiLSTM encoder with max pooling achieved the best results. InferSent comes in two versions: InferSent_1 is trained with Glove embeddings <ref type="bibr" target="#b16">[17]</ref> and InferSent_2 with fastText <ref type="bibr" target="#b1">[2]</ref>. InferSent representations are by far the largest, with the dimensionality of 4096 in both versions.</p><p>Similarly to InferSent, Universal Sentence Encoder <ref type="bibr" target="#b3">[4]</ref> uses unsupervised learning augmented with training on supervised data from SNLI. There are two models available. USE_T<ref type="foot" target="#foot_1">3</ref> is a transformer-network <ref type="bibr" target="#b22">[23]</ref> designed for higher accuracy at the cost of larger memory use and computational time. USE_D<ref type="foot" target="#foot_2">4</ref> is a deep averaging network <ref type="bibr" target="#b11">[12]</ref>, where words and bi-grams embeddings are averaged and used as input to a deep neural network that computes the final sentence embeddings. This second model is faster and more efficient but its accuracy is lower. Both models output representation with 512 dimensions.</p><p>Unlike the previous models, BERT <ref type="foot" target="#foot_3">5</ref> (Bidirectional Encoder Representations from Transformers) <ref type="bibr" target="#b9">[10]</ref> is a deep unsupervised language representation, pre-trained using only unlabeled text. It has two self-supervised training objectives -masked language modelling and next sentence classification. It is considered bidirectional as the Transformer encoder reads the entire sequence of words at once. We use a pre-trained BERT-Large model with Whole Word Masking. BERT gives embeddings for every (sub)word unit, we take as a sentence embedding a [CLS] token, which is inserted at the beginning of every sentence. BERT embeddings have 1,024-dimensions.</p><p>ELMo<ref type="foot" target="#foot_4">6</ref> (Embedding from Language Models) <ref type="bibr" target="#b4">[5]</ref> uses representations from a biLSTM that is trained with the language model objective on a large text dataset. Its embeddings are a function of the internal layers of the bidirectional Language Model (biLM), which should capture not only semantics and syntax, but also different meanings a word can represent in different contexts (polysemy). Similarly to BERT, each token representation of ELMo is a function of the entire input sentence -one word gets different embeddings in different contexts. ELMo computes an embedding for every token and we compute the final sentence embedding as the average over all tokens. It has dimensionality 1024.</p><p>LASER<ref type="foot" target="#foot_5">7</ref> (Language-Agnostic SEntence Representations) <ref type="bibr" target="#b0">[1]</ref> is a five-layer bi-directional LSTM (BiLSTM) network. The 1,024-dimension vectors are obtained by max-pooling over its last states. It was trained to translate from more than 90 languages to English or Spanish at the same time, the source language was selected randomly in each batch.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Choosing Vector Operations</head><p>Mikolov, Chen, Corrado and Dean <ref type="bibr" target="#b14">[15]</ref> used a simple vector difference as the operation that relates two word embeddings. For sentences embeddings, we experiment a little and consider four simple operations: addition, subtraction, multiplication and division, all applied elementwise. More operations could be also considered as long as they are reversible, so that we can isolate the vector change for a particular sentence alternation and apply it to the embedding of any other sentence. Hopefully, we would then land in the area where the correspondingly altered sentence is embedded.</p><p>The underlying idea of our analysis was already sketched in Figure <ref type="figure">1</ref>. From every sentence pair in our dataset, we extract the pattern, i.e. the string edit of the sentences. The arithmetic operation needed to move from the embedding of the first sentence to the embedding of the second sentence (in the continuous space of sentences) can be represented as a point in what we call the space of operations. Considering all sentence pairs that share the same edit pattern, we obtain many points in the space of operations. If the space of sentences reflects the particular edit pattern in an accessible way, all the corresponding points in the space of operations will be close together, forming a cluster.</p><p>To select which of the arithmetic operations best suits the data, we test pattern clustering with three common clustering performance evaluation methods: Table <ref type="table">1</ref>: This table presents the quality of pattern clustering in terms of the three cluster evaluation measures in the space of operations. For all the scores, the value of 1 represents a perfect assignment and 0 corresponds to random label assignment. All the numbers were computed using the Scikit-learn library <ref type="bibr" target="#b15">[16]</ref>. Best operation according to each cluster score across the various embeddings in bold. • V-measure <ref type="bibr" target="#b18">[19]</ref> is harmonic mean of homogeneity (each cluster should contain only members of one class) and completeness (all members of one class should be assigned to the same cluster). The score ranges from 0 (the worst situation) to 1 (perfect score).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Adjusted</head><p>• Adjusted Mutual Information <ref type="bibr" target="#b20">[21]</ref> measures the agreement of the two clusterings with the correction of agreement by chance. The random label assignment gets a score close to 0, while two identical clusterings get the score of 1.</p><p>As the detailed description of these measures is out of scope of this article, we refer readers to related literature (e.g. <ref type="bibr" target="#b23">[24]</ref>). We use these scores to compare patterns with labels predicted by k-Means (best result of 100 random initialisations). The results are presented in Table <ref type="table">1</ref>. It is apparent that the best distribution by far is achieved using the most intuitive operation, vector subtraction.</p><p>There seems to be a weak correlation between the size of embeddings and the scores. The smallest embeddings USE_D and USE_T are getting the worst scores, while the largest embeddings InferSent_1 are the best scoring embeddings. However, InferSent_2 with dimensionality 4096 is performing poorly. The fact that several of the embeddings were trained on SNLI does not to seem benefit those embeddings. Between the three top scored embeddings, only InferSent_1 was trained on the data that we use for evaluation of embeddings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Experiments</head><p>For the following exploration of the continuous space of operations, we focus only on the ELMo embeddings. They scored second best in all scores but unlike the best scoring Infersent_1, ELMo was not trained on SNLI, which is the major source of our sentence pairs. The t-SNE <ref type="bibr" target="#b21">[22]</ref> visualisation of subtractions of ELMo vectors is presented in Figure <ref type="figure" target="#fig_1">4</ref>. The visualisation is constructed automatically and, of course, without the knowledge of the pattern label. It shows that the patterns are generally grouped together into compact clusters with the exception of a 'chaos cloud' in the middle and several outliers. Also there are several patterns that seem inseparable, e.g. "two X → X" and "three X → X", or "X white Y -&gt; X Y" and "X black Y -&gt; X Y".</p><p>We identified the patterns responsible for the noisy center and outliers by computing weighted inertia for each pattern (the sum of squared distances of samples to their cluster center divided by the size of sample). The clusters with highest inertia consists of patterns representing a change of word order and/or adding or removing punctuation. These patterns are:</p><formula xml:id="formula_0">X is Y . → Y is X X Y . → Y X . X → X . X , Y . → Y X . X , Y . → Y , X . X Y . → Y , X . X . → X</formula><p>To see if the space of operations can be interpreted also automatically, i.e. if the sentence relations are generalizable, we remove the noisy patterns as above and apply fully unsupervised clustering: we do not even disclose the expected number of patterns, i.e. clusters. We try two metrics for finding the optimal number of clusters: Davies-Bouldin's index <ref type="bibr" target="#b8">[9]</ref> and Silhouette Coefficient <ref type="bibr" target="#b19">[20]</ref>. They are both designed to measure compactness and separation of the clusters, i.e. they award dense clusters that are far from each other. Both Davies-Bouldin index and Silhouette Coefficient agree that the best separation is achieved   <ref type="figure" target="#fig_1">4</ref> with colors coding now fully automatic clusters. Each cluster is labelled with the set of patterns extracted from sentence pairs assigned to the cluster. The numbers in parentheses indicate how many sentence pairs belong to the given pattern within this cluster and overall, resp. For instance the line "two X → X (52/56)" says that of the 56 sentence pairs differing in the prefix "two", 52 were automatically clustered together based on the subtraction of their ELMo embeddings. at 9 clusters. Running k-Means with 9 clusters, we get the result as plotted in Figure <ref type="figure" target="#fig_2">5</ref>.</p><p>Manually inspecting the contents of the automatically identified clusters, we see that many clusters are meaningful in some way. For instance, Cluster 1 captures 90% (altogether 264 out of 292) sentence pairs exerting the pattern of generalizing women, boys or girls to people. The counterpart for men belonging to people is spread into Cluster 5 (218 out of 227 pairs) for the singular case and not so clean Cluster 7 containing 57/57 of the plural pairs "X men Y → X people Y" together with various oppositions. Cluster 2 covers all sentence pairs where a person is replaced with a dog. Cluster 3 is primarily connected with sentence pairs introducing bad mood. Cluster 4 unites patterns that represent omitting a numeral/group. Cluster 6 covers gender oppositions in one direction and Cluster 9 adds the other direction (with some noise for child/man and person/man and similar), etc.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">Conclusion and Future Work</head><p>We examined vector spaces of sentence representations as inferred automatically by sentence embedding methods such as InferSent or ELMo. Our goal was to find out if some simple arithmetic operations in the vector space correspond to meaningful edit operations on the sentence strings.</p><p>Our first explorations of 60 sentence edit patterns document that this is indeed the case. Automatically identified frequent patterns with 20 or more occurrences in the SNLI and MultiNLI datasets correspond to simple vector differences. The ELMo space (and others such as Infersent_1, LASER and USE-T, which are omitted due to paper length requirements) exerts this property very well.</p><p>Unfortunately, choosing ELMo as example might not have been the best option -we compute ELMo embeddings by averaging contextualized word embeddings and majority of the patterns are just removing/adding/changing a single word. Difference between two such sentence embeddings may be a simple difference between the embeddings of the words substituted, depending on the effect of the contextualization. Thus, the differences in vector space would show rather the word embeddings than the sentence embeddings.</p><p>It should be noted that our search made use of only about 0.5% of the sentence pairs available in SNLI and MultiNLI. The remaining sentence pairs differ beyond what was extractable automatically using our simple pattern method. A different approach for a fine-grained description of the semantic relation between two sentences would have to be taken for a better exploitation of the available data.</p><p>Our plans for the long term are to further verify these observations using a more diverse set of vector operations and a larger set of sentence alternations, primarily by extending the set of alternation types. We also plan to exam-ine the possibilities of generating sentence strings back from the sentence embedding space. If successful, our method could lead to controlled paraphrasing via the continuous space: take an input sentence, embed it, modify the embedding using a vector operation and generate the target sentence in the standard textual from.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>.</head><label></label><figDesc>..being sad, not little... ...running instead of walking... ...a man instead of a dog... ...a grown-up instead of a child...</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: t-SNE representation of patterns. The points in the operation space are obtained by subtracting the ELMo embedding of the hypothesis from the ELMo embedding of the premise. Best viewed in color. Colors correspond to the sentence patterns.</figDesc><graphic coords="5,56.69,224.41,481.88,478.80" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: t-SNE representation of patterns as in Figure4with colors coding now fully automatic clusters. Each cluster is labelled with the set of patterns extracted from sentence pairs assigned to the cluster. The numbers in parentheses indicate how many sentence pairs belong to the given pattern within this cluster and overall, resp. For instance the line "two X → X (52/56)" says that of the 56 sentence pairs differing in the prefix "two", 52 were automatically clustered together based on the subtraction of their ELMo embeddings.</figDesc><graphic coords="6,107.67,276.74,381.14,371.38" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">https://github.com/facebookresearch/InferSent</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_1">https://tfhub.dev/google/universal-sentenceencoder-large/3</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_2"><ref type="bibr" target="#b3">4</ref> https://tfhub.dev/google/universal-sentence-</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_3">encoder/2 5 https://github.com/google-research/bert</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_4">https://github.com/HIT-SCIR/ELMoForManyLangs</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_5">https://github.com/facebookresearch/LASER</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_6">X Y -&gt; X sad Y X young Y -&gt; X sad Y X woman Y -&gt; X person Y X girl Y -&gt; X person Y X children Y -&gt; X men Y X child Y -&gt; X man Y X child Y -&gt; X person Y X boy Y -&gt; X person Y X red Y -&gt; X Y X blue Y -&gt; X Y X boy Y -&gt; X girl Y X boys Y -&gt; X girls Y X people Y -&gt; X men Y X person Y -&gt; X man X lady Y -&gt; X man Y X woman Y -&gt; X man Y X women Y -&gt; X men Y X girl Y -&gt; X boy Y X man Y -&gt; X woman Y X man Y -&gt; X person Y X -&gt; X . X . -&gt; X X -&gt; there is X X Y -&gt; X is Y a group of X -&gt; X two X -&gt; X three X -&gt; X X men Y -&gt; X people Y X men Y -&gt; X women Y man X -&gt; woman X X white Y -&gt; X Y X black Y -&gt; X Y X Y -&gt; X fat Y X Y -&gt; X busy Y X people Y -&gt; X dogs Y X little Y -&gt; X sad Y</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgment</head><p>This work has been supported by the grant No. 18-24210S of the Czech Science Foundation. It has been using language resources and tools stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (project LM2015071).</p></div>
			</div>

			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0" />			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond</title>
		<author>
			<persName><forename type="first">M</forename><surname>Artetxe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Schwenk</surname></persName>
		</author>
		<idno>CoRR, abs/1812.10464</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Enriching word vectors with subword information</title>
		<author>
			<persName><forename type="first">P</forename><surname>Bojanowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<idno>CoRR, abs/1607.04606</idno>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">A large annotated corpus for learning natural language inference</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">R</forename><surname>Bowman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Angeli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Potts</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP)</title>
				<meeting>the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP)</meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Universal sentence encoder</title>
		<author>
			<persName><forename type="first">D</forename><surname>Cer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Hua</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Limtiaco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>John</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Constant</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Guajardo-Cespedes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Yuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Tar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Strope</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kurzweil</surname></persName>
		</author>
		<idno>CoRR, abs/1803.11175</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Towards better UD parsing: Deep contextualized word embeddings, ensemble, and treebank concatenation</title>
		<author>
			<persName><forename type="first">W</forename><surname>Che</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Liu</surname></persName>
		</author>
		<idno>CoRR, abs/1807.03121</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Senteval: An evaluation toolkit for universal sentence representations</title>
		<author>
			<persName><forename type="first">A</forename><surname>Conneau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kiela</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1803.05449</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Supervised learning of universal sentence representations from natural language inference data</title>
		<author>
			<persName><forename type="first">A</forename><surname>Conneau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kiela</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Schwenk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Barrault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bordes</surname></persName>
		</author>
		<idno>CoRR, abs/1705.02364</idno>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">What you can cram into a single vector: Probing sentence embeddings for linguistic properties</title>
		<author>
			<persName><forename type="first">A</forename><surname>Conneau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kruszewski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Lample</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Barrault</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Baroni</surname></persName>
		</author>
		<idno>CoRR, abs/1805.01070</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">A cluster separation measure</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">L</forename><surname>Davies</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">W</forename><surname>Bouldin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Trans. Pattern Anal. Mach. Intell</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="224" to="227" />
			<date type="published" when="1979-02">Feb. 1979</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">BERT: pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno>CoRR, abs/1810.04805</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Comparing partitions</title>
		<author>
			<persName><forename type="first">L</forename><surname>Hubert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Arabie</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Classification</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="193" to="218" />
			<date type="published" when="1985-12">Dec 1985</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Deep unordered composition rivals syntactic methods for text classification</title>
		<author>
			<persName><forename type="first">M</forename><surname>Iyyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Manjunatha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Boyd-Graber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Daumé</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</title>
				<meeting>the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing<address><addrLine>Beijing, China</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2015-07">July 2015</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="1681" to="1691" />
		</imprint>
	</monogr>
	<note>: Long Papers)</note>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Subgram: Extending skipgram word representation with substrings</title>
		<author>
			<persName><forename type="first">T</forename><surname>Kocmi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Bojar</surname></persName>
		</author>
		<idno>CoRR, abs/1806.06571</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Nltk: The natural language toolkit</title>
		<author>
			<persName><forename type="first">E</forename><surname>Loper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Bird</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics</title>
				<meeting>the ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics<address><addrLine>Philadelphia</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2002">2002</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Efficient estimation of word representations in vector space</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">S</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Scikit-learn: Machine learning in Python</title>
		<author>
			<persName><forename type="first">F</forename><surname>Pedregosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Varoquaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gramfort</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Michel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Thirion</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Grisel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Blondel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Prettenhofer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Weiss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Dubourg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Vanderplas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Passos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Cournapeau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brucher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Perrot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Duchesnay</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="2825" to="2830" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Glove: Global vectors for word representation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pennington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</title>
				<meeting>the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)<address><addrLine>Doha, Qatar</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014-10">Oct. 2014</date>
			<biblScope unit="page" from="1532" to="1543" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">Evaluation of sentence embeddings in downstream and linguistic probing tasks</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">S</forename><surname>Perone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Silveira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">S</forename><surname>Paula</surname></persName>
		</author>
		<idno>CoRR, abs/1806.06259</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">V-measure: A conditional entropy-based external cluster evaluation measure</title>
		<author>
			<persName><forename type="first">A</forename><surname>Rosenberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Hirschberg</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)</title>
				<meeting>the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)<address><addrLine>Prague, Czech Republic</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2007-06">June 2007</date>
			<biblScope unit="page" from="410" to="420" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Silhouettes: A graphical aid to the interpretation and validation of cluster analysis</title>
		<author>
			<persName><forename type="first">P</forename><surname>Rousseeuw</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Comput. Appl. Math</title>
		<imprint>
			<biblScope unit="volume">20</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="53" to="65" />
			<date type="published" when="1987-11">Nov. 1987</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Cluster ensembles: A knowledge reuse framework for combining partitionings</title>
		<author>
			<persName><forename type="first">A</forename><surname>Strehl</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ghosh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Eighteenth National Conference on Artificial Intelligence</title>
				<meeting><address><addrLine>Menlo Park, CA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="93" to="98" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Visualizing data using t-SNE</title>
		<author>
			<persName><forename type="first">L</forename><surname>Van Der Maaten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Hinton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page" from="2579" to="2605" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Attention is all you need</title>
		<author>
			<persName><forename type="first">A</forename><surname>Vaswani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Parmar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Uszkoreit</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Jones</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Gomez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kaiser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Polosukhin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">NIPS</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">A novel approach for automatic number of clusters detection in microarray data based on consensus clustering</title>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">X</forename><surname>Vinh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Epps</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Ninth IEEE International Conference on Bioinformatics and Bio-Engineering</title>
				<imprint>
			<date type="published" when="2009-06">2009. June 2009</date>
			<biblScope unit="page" from="84" to="91" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<title level="m" type="main">Investigating the effects of word substitution errors on sentence embeddings</title>
		<author>
			<persName><forename type="first">R</forename><surname>Voleti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Liss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Berisha</surname></persName>
		</author>
		<idno>CoRR, abs/1811.07021</idno>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<monogr>
		<title level="m" type="main">gaggle and goose, book and read: Evaluating the utility of vector differences for lexical relation learning</title>
		<author>
			<persName><forename type="first">E</forename><surname>Vylomova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Rimell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Cohn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Baldwin</surname></persName>
		</author>
		<author>
			<persName><surname>Take</surname></persName>
		</author>
		<idno>CoRR, abs/1509.01692</idno>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">A broad-coverage challenge corpus for sentence understanding through inference</title>
		<author>
			<persName><forename type="first">A</forename><surname>Williams</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Nangia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">R</forename><surname>Bowman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">NAACL-HLT</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
