<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Evaluation of Distributional Compositional Operations on Collocations through Semantic Similarity</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Drozdova</forename><surname>Ksenia</surname></persName>
							<email>drozdova.xenia@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">National Research University Higher School of Economics Moscow</orgName>
								<address>
									<country key="RU">Russia</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Evaluation of Distributional Compositional Operations on Collocations through Semantic Similarity</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">4E8D6C51989EC932578D76FFC71C4548</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T10:00+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>compositional distributional semantic models</term>
					<term>vector word representations</term>
					<term>word2vec</term>
					<term>semantic similarity</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper considers comparative estimation of compositional distributional semantic models. Central to our approach is the idea that the meaning of a phrase is a function of the meanings of its parts. We provide two vector space models -for lemmatized and unlemmatized corpus, and four compositional functions, which we tested on a phrase similarity task. Our main goal is to estimate, which method most accurately expresses the relationship of whole</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>This paper presents a comparative study of compositional distributional semantic models. The experiments have been inspired by Gottlob Frege's classical idea of compositionality, that is to say, the meaning of a phrase is a function of the meaning of its parts <ref type="bibr" target="#b0">[1]</ref>.</p><p>With the aid of neural language models it is possible to test this statement and select a function which would best express the connection between the whole and its parts regarding our data. This work also analyzes the question whether it is more effective to lemmatize a corpus prior to training a model or work with unlemmatized data.</p><p>In order to create a vector semantic space the author has used distributional semantics predictive algorithms that have been realized in the utility word2vec <ref type="bibr" target="#b1">[2]</ref>. With the aid of these algorithms it is possible to create a vector space where words from the lexicon of the training corpus are put. First the coordinates of the words (their vectors) are initialized randomly, but during the process of training the similarity between the vectors of words that are neighbors in the corpus is maximized and the similarity between the vectors of the words that are not situated close to one another is minimized. The logic of such organization of space is based on the idea that words found in similar contexts usually have similar meanings and words whose contexts are not similar are semantically different. There is a metric of semantic similarity of words in this vector space; the metric is defined as vector cosine similarity. Thus, using neural models it is possible see the semantic map of the language from the calculation viewpoint.</p><p>The utility word2vec has realized two training algorithms: Continuous skipgram (skip-gram) and Continuous bag-of-words (CBOW). The model will be trained differently depending on the choice of algorithm. The objective function of the skip-gram algorithm is to predict a context using a word. The parameter window size defines what is considered a context; its value is equal to the maximum distance between the current word and the word that is being predicted. That is to say, the context of the word w i with the window size k would be w i−k , ..., w i−1 , w i , w i+1 , ..., w i+k . The objective function of CBOW is the opposite -it predicts a word using its nearest neighbors <ref type="bibr" target="#b2">[3]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Description of the parameters of the models</head><p>We has chosen Russian National Corpus as the training corpus for creating the semantic space. The study has been conducted using two models one of which has been trained using a lemmatized corpus and the other has been trained using an unlemmatized corpus. Henceforth these models will be referred to as Lemm and Token respectively. In both models stop words have been filtered out and the same set of hyperparameters is used:</p><p>dimensionality of the feature vectors is equal 300; window size 10; ignore all words with total frequency lower than 5; the training algorithm is skip-gram; negative sampling is used (5 samples); number of iterations over the corpus is equal 5.</p><p>In order to construct the models the author has used Gensim <ref type="bibr" target="#b5">[6]</ref>, particularly its module Phrases which detects common phrases and substitutes spaces between the words in such a phrase for underscores. For example, the common phrase 'Третий Рим' ('Third Rome' ) will be transformed into the token 'Третий_Рим' ('Third_Rome' ). Thus, the model will create vectors not only for separate word forms but also for collocations. Henceforth such vectors will be referred to as baseline.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Composition functions</head><p>Let us return to compositionality of phrases. In general we can describe the representation of a certain phrase w which consists of words w 1 , w 2 , ...w n as a vector</p><formula xml:id="formula_0">− → w = − → w 1 − → w 2 ... − → w n</formula><p>, where can mean addition +, point-wise multiplication , tensor product ⊗ and other operations on vectors. Such composition functions have been described in detail in the work <ref type="bibr" target="#b4">[5]</ref>. Its authors Jeff Mitchell and Mirella Lapata have researched methods based on multiplication of the corresponding elements and addition of vectors, countable distributional semantic models and models that have been created with the aid of LDA. The work <ref type="bibr" target="#b3">[4]</ref> studies the use of such compositional methods on prediction algorithms.</p><p>This paper considers four methods of creating a vector of a phrase using its components: a sum of vectors, element-wise multiplication, a weighted sum, tensor contraction, baseline (see Table <ref type="table">1</ref>). The weighted sum method supposes that phrase components should have different weights when added: the coefficient of the first component is α whereas the coefficient of the second component is β = 1 − α. In order to evaluate which way represents the semantic map of the language best the quality of each model is calculated.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Method</head><p>Function Formulа</p><formula xml:id="formula_1">Addition − → p = − → x + − → y pi = xi + yi Multiplication − → p = − → x − → y pi = xi • yi Weighted Addition − → p = α − → x + β − → y pi = αxi + βyi Circular Convolution − → p = − → x * − → y pi = j xj • yi−j Baseline p = x_y − → p is produced by algorithm Table 1: Compositional methods</formula><p>The quality of the models is evaluated with the aid of Spearman's coefficient of correlation between a man's estimation of the semantic similarity of phrases -the so-called 'gold standard'<ref type="foot" target="#foot_0">1</ref> -and the cosine similarity of the vectors of the same phrases:</p><formula xml:id="formula_2">similarity( − → p 1 , − → p 2 ) = cos( − → p 1 , − → p 2 ) = − → p 1 • − → p 2 | − → p 1 || − → p 2 | (1)</formula><p>The data of the gold standard consist mainly of phrases of the Adj+Noun type. In this experiment the author has used 105 pairs of phrases that the author has translated into the Russian language. The phrases have been lemmatized for the model Lemm.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experiments results</head><p>During the experiments the author has calculated the optimum weights for the weighted sum: α = 0.6 for the first phrase component, β = 1 − α = 0.4 for the second phrase component. This can be seen on the graph 4 where the horizontal axis is for the quality of the model and the vertical axis is for the value of the α parameter. The table 2 contains the results of the experiment -the values of Spearman's coefficient of correlation between the methods the author has studied and the gold standard. As can be seen in the table, the best result belongs to the weighted sum method used on a lemmatized corpus. Element-wise multiplication and tensor contraction did not produce good results which can be easily shown using geometrical representation: these ways suppose that a new vector can be placed randomly relative to its components in the vector space which means that the basic characteristics of semantic space are not preserved.</p><p>Despite the fact that the best result of creating a vector for a phrase belongs to the weighted sum method, baseline has proved to be a very good way. The values of the correlation coefficients belonging to the best method and baseline differ by only 0.01.</p><p>It is interesting that the model trained on the unlemmatized corpus has produced much worse results that the model trained on the lemmatized corpus. However, it should be noted that the methods of weighted sum, baseline and simple addition have proved the most effective on both models. This paper describes experiments with forming collocations that have been conducted with the aid of distributional semantic models. The study has two aims: to find out whether the model creates semantic space of the language better with lemmatization or without it and to determine which of the four compositional methods the author has described in this paper is the most effective in terms of creating vectors of phrases.</p><p>The most important result is that the neural language model forms a better vector representation for a lemmatized corpus, and the difference in results compared to those of an unlemmatized corpus is quite considerable (about 20 percent).</p><p>The question whether one should use compositional methods when working with collocations or turn to natural baseline (to unite collocations into a token before training) should be studied further engaging more data that would include combinations of various parts of speech. This paper has shown that the quality of baseline can be considered equal to the quality one could receive using the best compositional operator. It has been firmly determined by the research that this operator is the weighted sum of vectors:</p><formula xml:id="formula_3">− → p = 0.6 − → x + 0.4 − → y<label>(2)</label></formula></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. The dependence of the model quality on parameters for weighted addition model</figDesc><graphic coords="4,167.81,116.83,276.66,77.13" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 2 :</head><label>2</label><figDesc>Spearman ρ correlations of models with human judgements</figDesc><table><row><cell>Method</cell><cell>Lemm Token</cell></row><row><cell>Addition</cell><cell>0.71157 0.53716</cell></row><row><cell>Multiplication</cell><cell>0.27132 0.22719</cell></row><row><cell cols="2">Weighted Addition 0.71335 0.54661</cell></row><row><cell cols="2">Circular Convolution 0.07431 0.07931</cell></row><row><cell>Baseline</cell><cell>0.70766 0.53014</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://adapt.seiee.sjtu.edu.cn/similarity/SimCompleteResults.pdf</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title level="m" type="main">On sense and reference</title>
		<author>
			<persName><forename type="first">G</forename><surname>Frege</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1892">1892. 1997</date>
			<biblScope unit="page" from="563" to="584" />
			<pubPlace>Ludlow</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">Efficient estimation of word representations in vector space</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1301.3781</idno>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Distributed representations of words and phrases and their compositionality</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">S</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="3111" to="3119" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Evaluating neural word representations in tensor-based compositional settings</title>
		<author>
			<persName><forename type="first">D</forename><surname>Milajevs</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kartsaklis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sadrzadeh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Purver</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1408.6179</idno>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Composition in distributional models of semantics</title>
		<author>
			<persName><forename type="first">J</forename><surname>Mitchell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lapata</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Cognitive science</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="1388" to="1429" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Software Framework for Topic Modelling with Large Corpora</title>
		<author>
			<persName><forename type="first">R</forename><surname>Řehůřek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Sojka</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks</title>
				<meeting>the LREC 2010 Workshop on New Challenges for NLP Frameworks<address><addrLine>Valletta, Malta</addrLine></address></meeting>
		<imprint>
			<publisher>ELRA</publisher>
			<date type="published" when="2010-05">May 2010</date>
			<biblScope unit="page" from="45" to="50" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
