<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A model for high-coverage lexical semantic annotation generation</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Attila</forename><surname>Novák</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Borbála</forename><surname>Siklósi</surname></persName>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="institution">Pázmány Péter Catholic University Faculty of Information Technology and Bionics</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="institution">MTA-PPKE Hungarian Language Technology Research Group</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">A model for high-coverage lexical semantic annotation generation</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">9CF7BFFEC9B38D474FD4AECDFB834990</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T20:59+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>AI applications often receive their input in the form of natural language text, or as the transcription of spoken text. A commonsense inference system should transform such input to a formal representation with limited vocabulary in order to be able to process them. In this paper, we present a method based on neural word embeddings that automatically assigns semic features to words of natural language. These features either describe the ontological category of a given word or provide some characterization or additional information. We show that our method has high coverage and performs well for English and Hungarian, and can easily be extended to other languages as well.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Introduction</head><p>One of the most natural representations of commonsense knowledge is natural language. What people think or know about the world is expressed in either spoken or written language. Due to the popularity and accessibility of on-line media, crowds of people put their knowledge into written texts, either in the form of very short comments on social media sites or in the form of longer posts in addition to the writings of professional journalists. These texts, which are produced in a daily manner, adapt to changes in language use, and not only general knowledge, but facts and beliefs about the actual state of the world is also represented in them. Moreover, not only standard language, but slang and words used in informal contexts and special domains are also present in texts collected from the Web. In addition, more and more books representing a wide range of domains and styles are digitized. Large written corpora consisting of these resources are available as raw material for research, and can be exploited as a source of knowledge.</p><p>A more structured form of knowledge representation is hand-crafted ontologies, such as WordNet <ref type="bibr">(Fellbaum 1998;</ref><ref type="bibr" target="#b7">Miller 1995)</ref> or DBpedia <ref type="bibr" target="#b4">(Lehmann et al. 2015)</ref>. In Word-Net, concepts are collected into synonym sets and are organized into a strictly hierarchical structure of hyponymy relations, along with some horizontal relations, like meronymy. However, WordNet has been criticized for its too high granularity at the bottom level and its generality at the top level <ref type="bibr" target="#b1">(Brown 2008)</ref>. Moreover, its middle layers also contain many concepts that may be appropriate in a scientific tax-onomy, like 'fissiped mammal.n', but are not present in everyday language use. Similar problems concern most other structured knowledge bases. Moreover, since they are extremely costly to produce or extend to achieve a good lexical coverage, these resources are static in nature, they are not able to keep up with changes in language use and daily life, and they contain only standard word forms.</p><p>Whatever its source, a knowledge base is an essential component of a commonsense inference system. Even though recent results achieved by applying deep neural systems on raw textual input have been significant, traditional inference systems first transform their input written in natural language into a formal representation using features extracted from one or more knowledge bases, then they try to solve the given task based on this formal representation. In order to be able to process arbitrary input, the coverage of the knowledge bases used should be as high as possible <ref type="bibr" target="#b2">(Davis 1990)</ref>.</p><p>In this paper, we present an automatic method that is able to assign semantic features or atomic predicates to practically any (even non-standard/slang or misspelled) word form in a text in a language-independent manner. As we apply morphological analysis and lemmatization to the corpus both at the time of generating the embedding models and at query time, all forms of a single lemma are covered instead of only those explicitly present in the original corpus. This is essential to achieve a good coverage for an agglutinating language like Hungarian where a single lexeme may have hundreds of possible word forms, only few of which are actually present even in a huge corpus. Instead of constructing another static knowledge base of fixed vocabulary, we propose a dynamic tool that can be retrained or fine-tuned at any time using an up-to-date, possibly domain-specific corpus appropriate to the task at hand. The target formalism or set of semantic features to be used is also an interchangeable parameter of the proposed method. The set of features and predicates presented in this paper is derived from formalized definitions of a subset of the headwords (including the defining vocabulary) of the Longman Dictionary of Contemporary English (LDOCE) <ref type="bibr" target="#b14">(Summers 2005)</ref>. Both the vocabulary of the model and the features used are embedded in a neural-network-created word embedding vector space model <ref type="bibr">(Mikolov et al. 2013)</ref>.</p><p>Before we present the structure of the paper, let the fol- The paper is structured as follows: first, a brief introduction to neural word embeddings is presented. This is followed by the description of the lexical resource that we used when creating our models. In the following section, the method of building the model is described. In this paper, the method is demonstrated for English. However, existing semantic resources can also be mapped to word embedding spaces over the vocabulary of other languages. We have performed experiments with Hungarian, an agglutinative language with scarce semantic resources, but the method can easily be applied to other languages as well. Finally, we present both qualitative and quantitative evaluation of the models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Word Embedding Models</head><p>Traditional models of distributional semantics build word representations by counting words occurring in a fixed-size context of the target word <ref type="bibr" target="#b0">(Baroni, Dinu, and Kruszewski 2014)</ref>. In contrast, more recent methods for building distributional representations of words use neural networks to generate word embedding models <ref type="bibr">(Mikolov et al. 2013;</ref><ref type="bibr" target="#b11">Pennington, Socher, and Manning 2014)</ref> the most influential implementation of which is word2vec<ref type="foot" target="#foot_0">1</ref> .</p><p>When training embedding models, a fixed-size context of each word in the vocabulary is used as the input of a neural network. This network is used to predict the target word from the context by using back-propagation and adjusting the weights assigned to the connection between the input neurons (each corresponding to an item in the whole vocabulary) and the projection layer of the network. This weight vector can finally be extracted and used as the embedding vector of the target word. Since similar words are used in similar contexts, these vectors optimized for prediction from the context will also be similar for similar words. There are two types of neural networks used for this task. One of them is the so called CBOW (continuous bag-of-words) model in which the network is used to predict the target word from the context, while the other model, called skip-gram, is used to predict the context from the target word. For both models, the embedding vectors can be extracted from the middle layer of the network and can be used alike as a dense vector representation of the meaning of the words in both cases.</p><p>The vectors thus obtained point to certain locations in the semantic space consistently so that semantically and/or syntactically related words are close to each other, while unrelated ones are more distant. Moreover, it has been shown that vector operations can also be applied to these representations, thus the semantic relatedness of two words can be quantified as the algebraic difference of the two vectors representing these words. Similarly, the meaning of the com-position of two (or more) words is generally well represented by the sum of the corresponding embedding vectors <ref type="bibr" target="#b6">(Mikolov, Yih, and Zweig 2013)</ref>.</p><p>As the words are represented as dense real-valued vectors, the similarity of two words can easily be defined as the angle between the vectors of the words, i.e. the most similar words for a query word can be retrieved by finding its nearest neighbours in the vector space according to cosine distance.</p><p>One of the main drawbacks of building such a model from raw corpora, however, is that by itself it is not able to handle polysemy and homonymy, because one representational vector is built for one lexical element regardless of the number of its different senses. We applied a simple method to alleviate this problem, at least in cases where the homonyms have different PoS. In order to assign different vectors to the same word with different parts-of-speech, we applied PoS-tagging and lemmatization to the training corpora before building the model. The main PoS tag of each word was attached to the word as a suffix in the form lemma#PoS, thus a different embedding vector was created for homonymous lemmas with different parts-of-speech.</p><p>We trained an English word embedding model on the English Wikipedia dump<ref type="foot" target="#foot_1">2</ref> of 2.25 billion tokens (8.24 M token types) that was annotated using the Stanford tagger <ref type="bibr" target="#b15">(Toutanova et al. 2003)</ref>. Since the CBOW model has proved to be more efficient for large training corpora, we used this model architecture for training with the radius of the context window set to 5 and the number of dimensions to 300 and using a token frequency limit of 5.</p><p>Figure <ref type="figure" target="#fig_0">1</ref> illustrates how the words pianist, teacher, turner, maid and their three nearest neighbors are arranged in the English word embedding space<ref type="foot" target="#foot_2">3</ref> . The original vectors consist of 300 dimensions, but these were mapped to a 2D representation using the t-sne algorithm (van der Maaten and Hinton 2008).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Lexical Resources</head><p>Our goal was to create a model that can assign semantic features and elementary predicates to words in an arbitrary text. Thus, first, the set of features to be used had to be defined. The Longman Dictionary of Contemporary English (LDOCE) <ref type="bibr" target="#b14">(Summers 2005</ref>) is a traditional dictionary containing words and their definitions. All definitions in the dictionary are written using a constrained defining vocabulary (Longman Defining Vocabulary (LDV)). The definitions of a subset of headwords in LDOCE, including all items in LDV and most frequent words listed in the BNC and the Google unigram count, were transformed into a formal description containing only unary and binary predicates in a resource called 4lang <ref type="bibr" target="#b3">(Kornai et al. 2015)</ref>. illustrated by the following examples (for the explanation of the notation used in these definitions see <ref type="bibr">(Kornai et</ref>   We further transformed this format so that we have some category labels (here: unary and binary predicates) and listed examples. This was achieved by segmenting the formal descriptions into elementary predicates (by splitting at commas), but we did not segment predicates into further parts, so e.g. HAS[four.(legs)] remained an atomic feature. Each such token was treated as a category label. Then, all words that had the particular token in their definition were listed as an example for that label. This resulted in 1489 category labels and 12,507 words listed as examples for them. Then, in order to make this resource compatible with the word embedding model built from the Wikipedia corpus, its vocabulary was intersected with that model. Even though the vocabulary of this resource consists mostly of frequent words used in LDOCE definitions, it also includes some affixes, inflected forms, and a few multiword items, which are not present in the lemmatized Wikipedia model, so the intersection resulted in 11,039 words. Table <ref type="table">1</ref> shows some examples words for some features derived from the 4lang resource.</p><p>However, some categories were too broad and the set of words listed for them was too heterogeneous. To handle this problem, a hierarchical agglomerative clustering algorithm was applied to the set of words in those categories that contained at least five words. The reason for applying a hierarchical clustering rather than k-means is based on the argu-ment of <ref type="bibr" target="#b12">(Pereira, Tishby, and Lee 1993)</ref>, who states that due to the sophisticated variability of written texts, the number of clusters of the concepts used in a certain text cannot be predicted. A hierarchical organization, however, is appropriate for producing compact groups of words and phrases, based on the actual text, rather than on some predefined generalization. The linkage method for the hierarchical clustering was chosen based on the cophenet correlation between the original data points and the resulting linkage matrix <ref type="bibr">(Sokal and Rohlf 1962)</ref>. The best correlation was achieved when using Wards distance criteria <ref type="bibr" target="#b16">(Ward 1963)</ref>, resulting in small and dense groups of terms at the lower level of the resulting dendrogram. However, we did not need the whole hierarchy, represented as a binary tree, but separate, compact groups of terms, i.e. well-separated subtrees of the dendrogram. The most intuitive way of defining these cutting points of the tree is to find large jumps in the clustering levels. To put it more formally, the height of each link in the cluster tree is to be compared with the heights of neighbouring links below it in a certain depth. If this difference is larger than a predefined threshold value (i.e. the link is inconsistent), then the link is a cutting point. For more details of the clustering algorithm, see <ref type="bibr">(Siklósi 2016)</ref>. Each cluster was then labeled with the original category label with a numeric index added.</p><p>Even though we present our method using only the 4lang dictionary as a lexical resource, the system can be built from any dictionary that can be transformed to a similar format.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Method</head><p>Our objective was to create a model with high lexical coverage that can also return the most relevant semantic features for words not present in 4lang. In order to achieve this goal, the semantic features from this controlled set were projected into the embedding space containing the representation of the words. Nearest feature neighbors for each word can be retrieved from the model using the cosine distance metric.</p><p>For each indexed semantic predicate label output by the clustering algorithm, we iterated the list of example words annotated with their part-of-speech (the crude PoS tags used in the 4lang resource had to be mapped to the more finegrained PTB tags returned by the Stanford tagger) and retrieved their embedding vectors from the word embedding model built from the PoS-tagged Wikipedia corpus. As a simple but effective method for rendering a representation vector for a set of words with their corresponding word embeddings we took the mean of these vectors, and used that as the embedding vector of that particular semantic feature.  Thus a representation of each predicate used in the definitions was obtained in the semantic space created from the English PoS-tagged corpus. These semantic feature vectors were kept separated from the word vectors in the original embedding model in order to be able to restrict lookup to either words or features derived from each lexical resource.</p><p>To find the relevant features for a query word tagged with its appropriate part-of-speech, its representational vector is retrieved from the word embedding model and its nearest neighbors are taken from the model containing the semantic predicates. Since instead of exact matching, nearest neighbors are searched for, out-of-vocabulary words (with respect to the original lexical resources) can also be assigned semantic labels. The only requirement is that the word must be present in the word embedding model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Other languages</head><p>We also carried out some experiments to apply our method to another language, Hungarian. Hungarian is an agglutinative language with very few lexical semantic resources.</p><p>As the original 4lang dictionary contained the Hungarian translation of the vocabulary included (3477 words), it was straightforward to create a similar model for Hungarian as well. For this, we had to create a Hungarian word embedding model, which was built from a web-crawled corpus of 3.18 billion tokens (27.49 M token types) that was annotated using the PurePos <ref type="bibr" target="#b10">(Orosz and Novák 2013</ref>) tagger, augmented with the Humor Hungarian morphological analyzer <ref type="bibr" target="#b9">(Novák 2014;</ref><ref type="bibr" target="#b8">Novák, Siklósi, and Oravecz 2016)</ref>. We applied the method described above to define the position of the features in the Hungarian word embedding space by calculating the mean of the vector representations of the Hungarian example words for each semantic predicate. Our approach can easily be extended to any other language by translating this dictionary of moderate size (relative to complicated knowledge bases). Furthermore, this method also adapts to differences in word usage in different languages, since words are represented with their embedding vector in the target language.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Experiments and Results</head><p>The aim of this research was to investigate the possibility of providing a high coverage tool for assigning a semantic representation to words of a natural language input dynamically instead of using a static knowledge base with a limited vocabulary. Thus, first we investigated the performance of the tool for some example input, then we also performed a quantitative analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Qualitative analysis</head><p>Table <ref type="table" target="#tab_2">2</ref> shows an example: Laika likes eating fried onion with cucumber. First, using the Stanford parser, the input is annotated with part-of-speech tags and each word is lemmatized. Then, for each lemmatized content word (i.e. omitting the function word with) with corresponding part-of-speech, the top 10 nearest features are retrieved from the model and ordered by their distance from the vector representing the target word in the embedding space. Note that the number of top n features generated for each word is a free parameter, but moving further in the semantic space results in less and less appropriate features for the target word. Table <ref type="table" target="#tab_3">3</ref> shows the WordNet hypernyms assigned to each content word in the same sentence (the representation of the adjective fried and the proper name Laika is missing from WordNet).</p><p>As it can be seen in the example, our model is able to assign two types of features to words. Ontological/taxonomic categories, such as carnivorous, mammal for the word Laika vegetable, food for the words onion and cucumber appear together with characteristic features of the given concept, such as faithful, HAS{four(legs)}, AT/2744.farm or round and CAUSE{food.HAS.taste}. While the first type of features can be extracted from traditional ontologies, the latter type of characteristics can not. However, we believe that the latter type of features form an important part to common sense knowledge, because if people are asked to describe a concept, they will rather use such characteristics. Moreover, an inference system can also benefit from such descriptions. It can also be seen from the example, that the model "knows" that Laika is a dog by returning semantic features characterizing dogs. In addition, the feature EAT.flesh emphasizes the contrast of Laika being a dog and eating cucumber and onion.</p><p>Another benefit of our model, as mentioned above, is that it is able to generate features for all the words that are present in the original corpus the word embedding was built from, not only for the extremely limited set of words included in the 4lang dictionary. WordNet or other hand-made resources are limited only to the words and the classification that the designer of the resource had in mind. Our model, in contrast, is able to assign features to proper names, slang words or mistyped word forms as well as long as these are represented in the corpus the word embedding model was created from. In addition to the above example containing the dog name Laika, the following examples show some of the nearest features for two more proper names and two slang words: IBM: information.IN, computer, equipment, electric, group Facebook: information.ON, ABOUT.recent(events), computer hype: fame, fun, idea, popular, surprise numpty: bad, lazy, stupid, lack(work), dull A weakness of our method is that in some cases it also adds noise in the generated features. For example, features such as sleep or sing generated for the verb eat are not ones we would expect to be part of the definition of eat (even if in a broader sense they might be related). Inappropriate features like this may be eliminated manually from the representations generated by the model. The model can thus also be used as an aid in a semi-automatic semantic resource creation/extension process proposing an initial representation that can be cleaned manually for applications that require a high-precision lexical semantic representation. Otherwise, the generated semantic features can be used in models performing some downstream task even without filtering out the noise. In that case, the added semantic features may improve the performance of the downstream tool providing mostly useful features for words that otherwise would completely lack semantic representation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Quantitative analysis</head><p>We also carried out two kinds of quantitative analysis of the performance of our model. First, we checked the robustness of the model by performing a sanity check. For each word present in the original 4lang dictionary, we calculated how many of the semantic features present in the original definition were retrieved among the top N features returned by the model (feature recall, R f ) and the percentage of words for which all features were retrieved (word recall, R w ). The results are shown in Table <ref type="table" target="#tab_4">4</ref> as a function of N (numbers are percentages). Recall was also calculated ignoring words having more than N features (R p w ) and discounting features features marked wrong at least once.</p><formula xml:id="formula_0">N Rw R p w R f R p f |f | ≤ N P<label>(</label></formula><p>over the N limit for words having more than N features (R p f ). As no definition contained more than 10 terms, R p w is identical to R w and R p f is identical to R f for N ≥ 10. The definitions are terse and contain a minimal description for each word: for half of the words containing only a single term, and for almost all words not more than 5, see column |f | ≤ N ). Feature precision (P (f )) apparently decreases quickly as the number of features retrieved increases if we blindly accept only terms present in the original definitions as correct. See, however, further discussion below. The last column of the table shows the mean average precision (MAP) of features (terms) present in the original definitions.</p><p>In the other experiment, we selected 280 words not present in the original dictionary randomly from a predefined list of Hungarian words in which each word was assigned to one of 28 semantic domains (e.g. food, vehicles, locations, occupations, etc.). From each domain 10 words were chosen randomly and were translated to English. Then, for these words, the 10 nearest features were generated and two human annotators checked whether each feature was adequate for each given word. The same evaluation was performed for Hungarian. The agreement ratio between the annotators was 0.798 for English and 0.734 for Hungarian according to Cohen's kappa, which is substantial in both cases. The results are shown in Table <ref type="table" target="#tab_5">5</ref>.</p><p>The table shows feature accuracy (acc: the ratio of correctly assigned features) in each domain. We also automatically computed feature "domain accuracy" (d-acc): here we ignored feature assignment errors where the same feature was marked adequate for another test word in the same domain. The number of different features that appeared in this evaluation and the number of features marked wrong at least once are shown in the last two columns. Note that the feature accuracy (precision) for 10 features retrieved turned out to be much higher (75.13%) than in the sanity check experiment (only 32.70%) even though this list contained words not in the original resource. The reason for this is that the model returns many features which, while not explicitly present in the original terse definitions, correctly follow from the knowledge embodied in the feature model. E.g. While the definition of dog in 4lang contains only 3 terms: animal, faithful and carnivorous, the top 10 features retrieved from the model also include mammal, HAS{four(legs)}, hairy and companion. The sanity check experiment thus grossly underestimated the precision of the model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Conclusion</head><p>We have presented an automatic method that is able to assign semantic features to words of natural language. This approach exploits the representative power of neural word embeddings by mapping features derived from formal definitions of words to the vector space of the given language. In addition to some illustrative examples, we have presented the evaluation of the models demonstrating that the method works with relatively high accuracy. Although there is a moderate amount of noise in the set of generated features, the method has a very high coverage, being able to process proper names or non-standard words as well, which cannot all be included in hand-made static knowledge bases. As such, our automatic method can be used as the base of a manually constructed resource, or can provide valuable input for downstream applications, such as commonsense inference systems.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: The arrangement of the 3 nearest neighbors of the words pianist, teacher, turner, maid in the English word embedding space</figDesc><graphic coords="3,61.80,180.71,222.91,145.73" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>lowing example illustrate the kind of semantic annotation automatically assigned by the model to words in the sentence The cow gives milk to her calf.:</figDesc><table /><note>cow: mammal, at_farm, produce_milk, HAS{four(legs)}, animal gives: =AGT.CAUSE{=DAT.HAS.=PAT}, give, offer, communicate milk: food, sweet, drink, liquid calf: young, mammal, animal, has_wool. HAS{four(legs)}</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head></head><label></label><figDesc>al. 2015)):</figDesc><table><row><cell>Category</cell><cell>Example words in 4lang</cell></row><row><cell></cell><cell>bread: food, FROM/2742 flour, bake MAKE</cell></row><row><cell></cell><cell>(a type of food made from flour and water that is</cell></row></table><note>PART OF.body body#NN, tongue#NN, back#NN, neck#NN, shoulder#NN, bone#NN, skin#NN, wrist#NN, buttock#NN etc. =AGT.HAS.mouth swallow#VB, suck#VB, eat#VB, drink#VB HAS{four(legs)} horse#NN, tiger#NN mammal mammal#NN, lion#NN, deer#NN, man#NN, horse#NN, sheep#NN, cattle#NN, rabbit#NN, cat#NN, pig#NN, goat#NN, cow#NN =AGT.HAS.mind read#VB, remember#VB, feel#VB, understand#VB =AGT.CAUSE{=DAT.KNOW.=PAT} express#VB, teach#VBTable 1: Example words for some semantic features (predicates) after transforming the definitions to the format consisting of labels and example words</note></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2 :</head><label>2</label><figDesc>An example sentence, Laika likes eating fried onion with cucumber with features assigned to each word using our method</figDesc><table><row><cell>Original word</cell><cell>Analyzed word</cell><cell>Features</cell></row><row><cell>Laika</cell><cell>Laika#NNP</cell><cell cols="2">carnivorous mammal faithful HAS.short(hair/3359) HAS{four(legs)} AT/2744.farm companion young EAT.flesh HAS.long(tail)</cell></row><row><cell>likes</cell><cell>like#VB</cell><cell cols="2">want =PAT{person} wish emotion ask =AGT.HAS.mind annoy =PAT.IN/2758.mind communicate desire =AGT.HAS.body</cell></row><row><cell>eating</cell><cell>eat#VB</cell><cell cols="2">swallow =AGT.HAS.mouth eat love INSTRUMENT.tongue =AGT.CAUSE{=PAT{move}} sleep suck sing touch rest</cell></row><row><cell>fried</cell><cell>fried#JJ</cell><cell cols="2">food '.COOK/825 '.SERVE thick/2134 FROM/2742.flour bake.MAKE FROM/2742.milk food.IN/2758 vegetable sweet bread</cell></row><row><cell>onion</cell><cell>onion#NN</cell><cell cols="2">'.COOK/825 vegetable fruit food FROM/2742.milk sweet round soft thick/2134 PART OF.plant</cell></row><row><cell>with</cell><cell>with#IN</cell><cell></cell></row><row><cell>cucumber</cell><cell>cucumber#NN</cell><cell cols="2">vegetable fruit food '.COOK/825 sweet '.EAT round CAUSE{food.HAS.taste} PART OF.plant soft</cell></row><row><cell></cell><cell></cell><cell>Original word</cell><cell>Analyzed word</cell><cell>Hypernyms</cell></row><row><cell></cell><cell></cell><cell>Laika</cell><cell>Laika#NNP</cell></row><row><cell></cell><cell></cell><cell>likes</cell><cell>like#VB</cell><cell>desire want</cell></row><row><cell></cell><cell></cell><cell>eating</cell><cell>eat#VB</cell><cell>consume digest take in take have</cell></row><row><cell></cell><cell></cell><cell>fried</cell><cell>fried#JJ</cell></row><row><cell></cell><cell></cell><cell>onion</cell><cell>onion#NN</cell><cell>vegetable produce food solid matter physical entity entity</cell></row><row><cell></cell><cell></cell><cell>with</cell><cell>with#IN</cell></row><row><cell></cell><cell></cell><cell>cucumber</cell><cell>cucumber#NN</cell><cell>vegetable produce food solid matter physical entity entity</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3 :</head><label>3</label><figDesc>An example sentence, Laika likes eating fried onion with cucumber with hypernyms from WordNet assigned to each word</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 4 :</head><label>4</label><figDesc>Performance of the model for English tested on definitions in the 4lang vocabulary as a function of the number N of top-ranked features retrieved for each word. R w : Word recall (words for which all features were retrieved), R w (poss): recall for words having no more than N features, R f : feature recall, R f (poss): feature recall ignoring features over the top N , |f | ≤ N : percentage of words having no more than N features, P (f ): feature precision, MAP: mean average precision of features. Numbers are percentages.</figDesc><table><row><cell>f )</cell><cell>MAP</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 5 :</head><label>5</label><figDesc>Performance of the model on 280 different test words for English and Hungarian. acc: feature accuracy, dacc: domain accuracy of features, #F: different features, #B:</figDesc><table /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://code.google.com/archive/p/word2vec/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">downloaded from https://dumps.wikimedia.org/ in May, 2016.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">The PoS tag is NN for all example words, and it is omitted from the figure.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This research has been implemented with support provided by grant FK125217 of the National Research, Development and Innovation Office of Hungary financed under the FK17 funding scheme.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Don&apos;t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors</title>
		<author>
			<persName><forename type="first">M</forename><surname>Baroni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Dinu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Kruszewski</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics</title>
		<title level="s">Long Papers</title>
		<meeting>the 52nd Annual Meeting of the Association for Computational Linguistics<address><addrLine>Baltimore, Maryland</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="238" to="247" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Choosing sense distinctions for wsd: Psycholinguistic evidence</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">W</forename><surname>Brown</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, HLT-Short &apos;08</title>
				<meeting>the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers, HLT-Short &apos;08<address><addrLine>Stroudsburg, PA, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="249" to="252" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Representations of commonsense knowledge</title>
		<author>
			<persName><forename type="first">E</forename><surname>Davis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">WordNet: an electronic lexical database</title>
				<editor>
			<persName><forename type="first">Morgan</forename><surname>Kaufmann</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Fellbaum</surname></persName>
		</editor>
		<imprint>
			<publisher>MIT Press</publisher>
			<date type="published" when="1990">1990. 1998</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Competence in lexical semantics</title>
		<author>
			<persName><forename type="first">A</forename><surname>Kornai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ács</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Makrai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Nemeskey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Pajkossy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Recski</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Fourth Joint Conference on Lexical and Computational Semantics</title>
				<meeting>the Fourth Joint Conference on Lexical and Computational Semantics<address><addrLine>Denver, Colorado</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="165" to="175" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">DBpedia -a large-scale, multilingual knowledge base extracted from wikipedia</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Isele</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Jakob</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jentzsch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Kontokostas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">N</forename><surname>Mendes</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hellmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Morsey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Van Kleef</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Auer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bizer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Semantic Web Journal</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="167" to="195" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Distributed representations of words and phrases and their compositionality</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">S</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held</title>
				<meeting><address><addrLine>Lake Tahoe, Nevada, United States</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013-12-05">2013. December 5-8, 2013</date>
			<biblScope unit="page" from="3111" to="3119" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Linguistic regularities in continuous space word representations</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Yih</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Zweig</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings</title>
				<meeting><address><addrLine>Westin Peachtree Plaza Hotel, Atlanta, Georgia, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2013-06-09">2013. June 9-14, 2013</date>
			<biblScope unit="page" from="746" to="751" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Wordnet: A lexical database for english</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">A</forename><surname>Miller</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">COMMUNICATIONS OF THE ACM</title>
		<imprint>
			<biblScope unit="volume">38</biblScope>
			<biblScope unit="page" from="39" to="41" />
			<date type="published" when="1995">1995</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">A New Integrated Open-source Morphological Analyzer for Hungarian</title>
		<author>
			<persName><forename type="first">A</forename><surname>Novák</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Siklósi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Oravecz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)</title>
				<editor>
			<persName><forename type="first">)</forename><surname>Chair</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><forename type="middle">C C</forename><surname>Choukri</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Declerck</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Goggi</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Grobelnik</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Maegaard</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Mariani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Mazo</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Moreno</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Odijk</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Piperidis</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename></persName>
		</editor>
		<meeting>the Tenth International Conference on Language Resources and Evaluation (LREC 2016)<address><addrLine>Paris, France</addrLine></address></meeting>
		<imprint>
			<publisher>ELRA</publisher>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">A new form of humor -mapping constraint-based computational morphologies to a finitestate representation</title>
		<author>
			<persName><forename type="first">A</forename><surname>Novák</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC&apos;14)</title>
				<editor>
			<persName><forename type="first">)</forename><surname>Chair</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><forename type="middle">C C</forename><surname>Choukri</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Declerck</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Loftsson</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Maegaard</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Mariani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Moreno</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Odijk</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J</forename><surname>Piperidis</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename></persName>
		</editor>
		<meeting>the Ninth International Conference on Language Resources and Evaluation (LREC&apos;14)<address><addrLine>Reykjavik, Iceland</addrLine></address></meeting>
		<imprint>
			<publisher>ELRA</publisher>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">PurePos 2.0: a hybrid tool for morphological disambiguation</title>
		<author>
			<persName><forename type="first">G</forename><surname>Orosz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Novák</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2013)</title>
				<meeting>the International Conference on Recent Advances in Natural Language Processing (RANLP 2013)<address><addrLine>Hissar, Bulgaria; Shoumen, BULGARIA</addrLine></address></meeting>
		<imprint>
			<publisher>INCOMA Ltd</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="539" to="545" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Glove: Global vectors for word representation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pennington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Empirical Methods in Natural Language Processing (EMNLP)</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1532" to="1543" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Distributional clustering of english words</title>
		<author>
			<persName><forename type="first">F</forename><surname>Pereira</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Tishby</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Lee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 31st Annual Meeting on Association for Computational Linguistics, ACL &apos;93</title>
				<meeting>the 31st Annual Meeting on Association for Computational Linguistics, ACL &apos;93<address><addrLine>Stroudsburg, PA, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="1993">1993</date>
			<biblScope unit="page" from="183" to="190" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Using embedding models for lexical categorization in morphologically rich languages</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">;</forename><surname>Siklósi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Rohlf</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Computational Linguistics and Intelligent Text Processing: 17th International Conference, CICLing 2016</title>
				<editor>
			<persName><forename type="first">A</forename><surname>Gelbukh</surname></persName>
		</editor>
		<meeting><address><addrLine>Konya, Turkey</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="1962">2016. 1962</date>
			<biblScope unit="volume">11</biblScope>
			<biblScope unit="page" from="33" to="40" />
		</imprint>
	</monogr>
	<note>The comparison of dendrograms by objective methods</note>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Longman Dictionary of Contemporary English</title>
		<author>
			<persName><forename type="first">D</forename><surname>Summers</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Longman Dictionary of Contemporary English Series</title>
				<imprint>
			<publisher>Longman</publisher>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Feature-rich part-of-speech tagging with a cyclic dependency network</title>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Klein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">D</forename><surname>Manning</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Singer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Maaten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Hinton</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology -Volume 1, NAACL &apos;03</title>
				<meeting>the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology -Volume 1, NAACL &apos;03<address><addrLine>Stroudsburg, PA, USA</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2003">2003. 2008</date>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page" from="2579" to="2605" />
		</imprint>
	</monogr>
	<note>Visualizing high-dimensional data using t-sne</note>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Hierarchical grouping to optimize an objective function</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">H</forename><surname>Ward</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of the American Statistical Association</title>
		<imprint>
			<biblScope unit="volume">58</biblScope>
			<biblScope unit="issue">301</biblScope>
			<biblScope unit="page" from="236" to="244" />
			<date type="published" when="1963">1963</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
