=Paper=
{{Paper
|id=Vol-2052/paper15
|storemode=property
|title=A Model for High-coverage Lexical Semantic Annotation Generation
|pdfUrl=https://ceur-ws.org/Vol-2052/paper15.pdf
|volume=Vol-2052
|authors=Attila Novák,Borbála Siklósi
|dblpUrl=https://dblp.org/rec/conf/commonsense/NovakS17
}}
==A Model for High-coverage Lexical Semantic Annotation Generation==
A model for high-coverage lexical semantic annotation generation
Attila Novák, Borbála Siklósi
Pázmány Péter Catholic University Faculty of Information Technology and Bionics,
MTA-PPKE Hungarian Language Technology Research Group
1083 Práter u. 50/a
Budapest, Hungary
Abstract onomy, like ‘fissiped mammal.n’, but are not present in ev-
eryday language use. Similar problems concern most other
AI applications often receive their input in the form of structured knowledge bases. Moreover, since they are ex-
natural language text, or as the transcription of spoken
text. A commonsense inference system should trans-
tremely costly to produce or extend to achieve a good lex-
form such input to a formal representation with lim- ical coverage, these resources are static in nature, they are
ited vocabulary in order to be able to process them. In not able to keep up with changes in language use and daily
this paper, we present a method based on neural word life, and they contain only standard word forms.
embeddings that automatically assigns semic features Whatever its source, a knowledge base is an essen-
to words of natural language. These features either de- tial component of a commonsense inference system. Even
scribe the ontological category of a given word or pro- though recent results achieved by applying deep neural sys-
vide some characterization or additional information. tems on raw textual input have been significant, traditional
We show that our method has high coverage and per-
forms well for English and Hungarian, and can easily
inference systems first transform their input written in natu-
be extended to other languages as well. ral language into a formal representation using features ex-
tracted from one or more knowledge bases, then they try
to solve the given task based on this formal representation.
Introduction In order to be able to process arbitrary input, the coverage
One of the most natural representations of commonsense of the knowledge bases used should be as high as possible
knowledge is natural language. What people think or know (Davis 1990).
about the world is expressed in either spoken or written lan- In this paper, we present an automatic method that is able
guage. Due to the popularity and accessibility of on-line me- to assign semantic features or atomic predicates to prac-
dia, crowds of people put their knowledge into written texts, tically any (even non-standard/slang or misspelled) word
either in the form of very short comments on social media form in a text in a language-independent manner. As we ap-
sites or in the form of longer posts in addition to the writings ply morphological analysis and lemmatization to the corpus
of professional journalists. These texts, which are produced both at the time of generating the embedding models and at
in a daily manner, adapt to changes in language use, and not query time, all forms of a single lemma are covered instead
only general knowledge, but facts and beliefs about the ac- of only those explicitly present in the original corpus. This
tual state of the world is also represented in them. Moreover, is essential to achieve a good coverage for an agglutinating
not only standard language, but slang and words used in in- language like Hungarian where a single lexeme may have
formal contexts and special domains are also present in texts hundreds of possible word forms, only few of which are ac-
collected from the Web. In addition, more and more books tually present even in a huge corpus. Instead of constructing
representing a wide range of domains and styles are digi- another static knowledge base of fixed vocabulary, we pro-
tized. Large written corpora consisting of these resources are pose a dynamic tool that can be retrained or fine-tuned at
available as raw material for research, and can be exploited any time using an up-to-date, possibly domain-specific cor-
as a source of knowledge. pus appropriate to the task at hand. The target formalism or
A more structured form of knowledge representation is set of semantic features to be used is also an interchange-
hand-crafted ontologies, such as WordNet (Fellbaum 1998; able parameter of the proposed method. The set of features
Miller 1995) or DBpedia (Lehmann et al. 2015). In Word- and predicates presented in this paper is derived from for-
Net, concepts are collected into synonym sets and are orga- malized definitions of a subset of the headwords (including
nized into a strictly hierarchical structure of hyponymy rela- the defining vocabulary) of the Longman Dictionary of Con-
tions, along with some horizontal relations, like meronymy. temporary English (LDOCE) (Summers 2005). Both the vo-
However, WordNet has been criticized for its too high gran- cabulary of the model and the features used are embedded
ularity at the bottom level and its generality at the top level in a neural-network-created word embedding vector space
(Brown 2008). Moreover, its middle layers also contain model (Mikolov et al. 2013).
many concepts that may be appropriate in a scientific tax- Before we present the structure of the paper, let the fol-
lowing example illustrate the kind of semantic annotation position of two (or more) words is generally well repre-
automatically assigned by the model to words in the sen- sented by the sum of the corresponding embedding vectors
tence The cow gives milk to her calf.: (Mikolov, Yih, and Zweig 2013).
cow: mammal, at_farm, produce_milk, HAS{four(legs)}, animal As the words are represented as dense real-valued vec-
gives: =AGT.CAUSE{=DAT.HAS.=PAT}, give, offer, communicate tors, the similarity of two words can easily be defined as the
milk: food, sweet, drink, liquid angle between the vectors of the words, i.e. the most sim-
calf: young, mammal, animal, has_wool. HAS{four(legs)}
ilar words for a query word can be retrieved by finding its
The paper is structured as follows: first, a brief introduc- nearest neighbours in the vector space according to cosine
tion to neural word embeddings is presented. This is fol- distance.
lowed by the description of the lexical resource that we One of the main drawbacks of building such a model from
used when creating our models. In the following section, raw corpora, however, is that by itself it is not able to handle
the method of building the model is described. In this pa- polysemy and homonymy, because one representational vec-
per, the method is demonstrated for English. However, ex- tor is built for one lexical element regardless of the number
isting semantic resources can also be mapped to word em- of its different senses. We applied a simple method to allevi-
bedding spaces over the vocabulary of other languages. We ate this problem, at least in cases where the homonyms have
have performed experiments with Hungarian, an agglutina- different PoS. In order to assign different vectors to the same
tive language with scarce semantic resources, but the method word with different parts-of-speech, we applied PoS-tagging
can easily be applied to other languages as well. Finally, we and lemmatization to the training corpora before building
present both qualitative and quantitative evaluation of the the model. The main PoS tag of each word was attached to
models. the word as a suffix in the form lemma#PoS, thus a differ-
ent embedding vector was created for homonymous lemmas
Word Embedding Models with different parts-of-speech.
Traditional models of distributional semantics build word We trained an English word embedding model on the
representations by counting words occurring in a fixed-size English Wikipedia dump2 of 2.25 billion tokens (8.24 M
context of the target word (Baroni, Dinu, and Kruszewski token types) that was annotated using the Stanford tagger
2014). In contrast, more recent methods for building dis- (Toutanova et al. 2003). Since the CBOW model has proved
tributional representations of words use neural networks to be more efficient for large training corpora, we used this
to generate word embedding models (Mikolov et al. 2013; model architecture for training with the radius of the context
Pennington, Socher, and Manning 2014) the most influen- window set to 5 and the number of dimensions to 300 and
tial implementation of which is word2vec1 . using a token frequency limit of 5.
When training embedding models, a fixed-size context of Figure 1 illustrates how the words pianist, teacher, turner,
each word in the vocabulary is used as the input of a neu- maid and their three nearest neighbors are arranged in the
ral network. This network is used to predict the target word English word embedding space3 . The original vectors con-
from the context by using back-propagation and adjusting sist of 300 dimensions, but these were mapped to a 2D rep-
the weights assigned to the connection between the input resentation using the t-sne algorithm (van der Maaten and
neurons (each corresponding to an item in the whole vocab- Hinton 2008).
ulary) and the projection layer of the network. This weight
vector can finally be extracted and used as the embedding Lexical Resources
vector of the target word. Since similar words are used in Our goal was to create a model that can assign semantic
similar contexts, these vectors optimized for prediction from features and elementary predicates to words in an arbitrary
the context will also be similar for similar words. There are text. Thus, first, the set of features to be used had to be de-
two types of neural networks used for this task. One of them fined. The Longman Dictionary of Contemporary English
is the so called CBOW (continuous bag-of-words) model in (LDOCE) (Summers 2005) is a traditional dictionary con-
which the network is used to predict the target word from taining words and their definitions. All definitions in the dic-
the context, while the other model, called skip-gram, is used tionary are written using a constrained defining vocabulary
to predict the context from the target word. For both mod- (Longman Defining Vocabulary (LDV)). The definitions of a
els, the embedding vectors can be extracted from the middle subset of headwords in LDOCE, including all items in LDV
layer of the network and can be used alike as a dense vector and most frequent words listed in the BNC and the Google
representation of the meaning of the words in both cases. unigram count, were transformed into a formal description
The vectors thus obtained point to certain locations in the containing only unary and binary predicates in a resource
semantic space consistently so that semantically and/or syn- called 4lang (Kornai et al. 2015). illustrated by the following
tactically related words are close to each other, while un- examples (for the explanation of the notation used in these
related ones are more distant. Moreover, it has been shown definitions see (Kornai et al. 2015)):
that vector operations can also be applied to these represen- bread: food, FROM/2742 flour, bake MAKE
tations, thus the semantic relatedness of two words can be (a type of food made from flour and water that is
quantified as the algebraic difference of the two vectors rep-
2
resenting these words. Similarly, the meaning of the com- downloaded from https://dumps.wikimedia.org/ in May, 2016.
3
The PoS tag is NN for all example words, and it is omitted
1
https://code.google.com/archive/p/word2vec/ from the figure.
Category Example words in 4lang
PART OF.body body#NN, tongue#NN, back#NN, neck#NN, shoulder#NN, bone#NN, skin#NN, wrist#NN, buttock#NN etc.
=AGT.HAS.mouth swallow#VB, suck#VB, eat#VB, drink#VB
HAS{four(legs)} horse#NN, tiger#NN
mammal mammal#NN, lion#NN, deer#NN, man#NN, horse#NN, sheep#NN, cattle#NN, rabbit#NN, cat#NN, pig#NN, goat#NN, cow#NN
=AGT.HAS.mind read#VB, remember#VB, feel#VB, understand#VB
=AGT.CAUSE{=DAT.KNOW.=PAT} express#VB, teach#VB
Table 1: Example words for some semantic features (predicates) after transforming the definitions to the format consisting of
labels and example words
ment of (Pereira, Tishby, and Lee 1993), who states that due
to the sophisticated variability of written texts, the number of
clusters of the concepts used in a certain text cannot be pre-
dicted. A hierarchical organization, however, is appropriate
for producing compact groups of words and phrases, based
on the actual text, rather than on some predefined general-
ization. The linkage method for the hierarchical clustering
was chosen based on the cophenet correlation between the
original data points and the resulting linkage matrix (Sokal
and Rohlf 1962). The best correlation was achieved when
using Wards distance criteria (Ward 1963), resulting in small
and dense groups of terms at the lower level of the resulting
dendrogram. However, we did not need the whole hierarchy,
represented as a binary tree, but separate, compact groups of
terms, i.e. well-separated subtrees of the dendrogram. The
Figure 1: The arrangement of the 3 nearest neighbors of the most intuitive way of defining these cutting points of the tree
words pianist, teacher, turner, maid in the English word em- is to find large jumps in the clustering levels. To put it more
bedding space formally, the height of each link in the cluster tree is to be
compared with the heights of neighbouring links below it in
a certain depth. If this difference is larger than a predefined
mixed together and then baked) threshold value (i.e. the link is inconsistent), then the link is
show: =AGT CAUSE[=DAT LOOK =PAT], communicate
(to let someone see something) a cutting point. For more details of the clustering algorithm,
see (Siklósi 2016). Each cluster was then labeled with the
We further transformed this format so that we have some original category label with a numeric index added.
category labels (here: unary and binary predicates) and listed Even though we present our method using only the 4lang
examples. This was achieved by segmenting the formal de- dictionary as a lexical resource, the system can be built from
scriptions into elementary predicates (by splitting at com- any dictionary that can be transformed to a similar format.
mas), but we did not segment predicates into further parts,
so e.g. HAS[four.(legs)] remained an atomic feature. Each
such token was treated as a category label. Then, all words Method
that had the particular token in their definition were listed Our objective was to create a model with high lexical cover-
as an example for that label. This resulted in 1489 category age that can also return the most relevant semantic features
labels and 12,507 words listed as examples for them. Then, for words not present in 4lang. In order to achieve this goal,
in order to make this resource compatible with the word em- the semantic features from this controlled set were projected
bedding model built from the Wikipedia corpus, its vocab- into the embedding space containing the representation of
ulary was intersected with that model. Even though the vo- the words. Nearest feature neighbors for each word can be
cabulary of this resource consists mostly of frequent words retrieved from the model using the cosine distance metric.
used in LDOCE definitions, it also includes some affixes, For each indexed semantic predicate label output by the
inflected forms, and a few multiword items, which are not clustering algorithm, we iterated the list of example words
present in the lemmatized Wikipedia model, so the intersec- annotated with their part-of-speech (the crude PoS tags used
tion resulted in 11,039 words. Table 1 shows some examples in the 4lang resource had to be mapped to the more fine-
words for some features derived from the 4lang resource. grained PTB tags returned by the Stanford tagger) and re-
However, some categories were too broad and the set of trieved their embedding vectors from the word embedding
words listed for them was too heterogeneous. To handle this model built from the PoS-tagged Wikipedia corpus. As a
problem, a hierarchical agglomerative clustering algorithm simple but effective method for rendering a representation
was applied to the set of words in those categories that con- vector for a set of words with their corresponding word em-
tained at least five words. The reason for applying a hierar- beddings we took the mean of these vectors, and used that
chical clustering rather than k-means is based on the argu- as the embedding vector of that particular semantic feature.
Original word Analyzed word Features
Laika Laika#NNP carnivorous mammal faithful HAS.short(hair/3359) HAS{four(legs)} hAT/2744.farmi companion young EAT.flesh HAS.long(tail)
likes like#VB want =PAT{person} wish emotion ask =AGT.HAS.mind annoy =PAT.IN/2758.mind communicate desire =AGT.HAS.body
eating eat#VB swallow =AGT.HAS.mouth eat love INSTRUMENT.tongue =AGT.CAUSE{=PAT{move}} sleep suck sing touch rest
fried fried#JJ food ’.COOK/825 ’.SERVE thick/2134 FROM/2742.flour bake.MAKE FROM/2742.milk food.IN/2758 vegetable sweet bread
onion onion#NN ’.COOK/825 vegetable fruit food FROM/2742.milk sweet round soft thick/2134 PART OF.plant
with with#IN
cucumber cucumber#NN vegetable fruit food ’.COOK/825 sweet ’.EAT round CAUSE{food.HAS.taste} PART OF.plant soft
Table 2: An example sentence, Laika likes eating fried onion with cucumber with features assigned to each word using our
method
Original word Analyzed word Hypernyms
Laika Laika#NNP
likes like#VB desire want
eating eat#VB consume digest take in take have
fried fried#JJ
onion onion#NN vegetable produce food solid matter physical entity entity
with with#IN
cucumber cucumber#NN vegetable produce food solid matter physical entity entity
Table 3: An example sentence, Laika likes eating fried onion with cucumber with hypernyms from WordNet assigned to each
word
Thus a representation of each predicate used in the defini- bases). Furthermore, this method also adapts to differences
tions was obtained in the semantic space created from the in word usage in different languages, since words are repre-
English PoS-tagged corpus. These semantic feature vectors sented with their embedding vector in the target language.
were kept separated from the word vectors in the original
embedding model in order to be able to restrict lookup to Experiments and Results
either words or features derived from each lexical resource.
To find the relevant features for a query word tagged with The aim of this research was to investigate the possibility
its appropriate part-of-speech, its representational vector is of providing a high coverage tool for assigning a semantic
retrieved from the word embedding model and its nearest representation to words of a natural language input dynami-
neighbors are taken from the model containing the semantic cally instead of using a static knowledge base with a limited
predicates. Since instead of exact matching, nearest neigh- vocabulary. Thus, first we investigated the performance of
bors are searched for, out-of-vocabulary words (with respect the tool for some example input, then we also performed a
to the original lexical resources) can also be assigned se- quantitative analysis.
mantic labels. The only requirement is that the word must
be present in the word embedding model. Qualitative analysis
Table 2 shows an example: Laika likes eating fried onion
Other languages with cucumber. First, using the Stanford parser, the input is
We also carried out some experiments to apply our method annotated with part-of-speech tags and each word is lemma-
to another language, Hungarian. Hungarian is an aggluti- tized. Then, for each lemmatized content word (i.e. omitting
native language with very few lexical semantic resources. the function word with) with corresponding part-of-speech,
As the original 4lang dictionary contained the Hungarian the top 10 nearest features are retrieved from the model and
translation of the vocabulary included (3477 words), it was ordered by their distance from the vector representing the
straightforward to create a similar model for Hungarian as target word in the embedding space. Note that the number of
well. For this, we had to create a Hungarian word embedding top n features generated for each word is a free parameter,
model, which was built from a web-crawled corpus of 3.18 but moving further in the semantic space results in less and
billion tokens (27.49 M token types) that was annotated us- less appropriate features for the target word. Table 3 shows
ing the PurePos (Orosz and Novák 2013) tagger, augmented the WordNet hypernyms assigned to each content word in
with the Humor Hungarian morphological analyzer (Novák the same sentence (the representation of the adjective fried
2014; Novák, Siklósi, and Oravecz 2016). We applied the and the proper name Laika is missing from WordNet).
method described above to define the position of the features As it can be seen in the example, our model is able to as-
in the Hungarian word embedding space by calculating the sign two types of features to words. Ontological/taxonomic
mean of the vector representations of the Hungarian example categories, such as carnivorous, mammal for the word Laika
words for each semantic predicate. Our approach can easily vegetable, food for the words onion and cucumber appear
be extended to any other language by translating this dic- together with characteristic features of the given concept,
tionary of moderate size (relative to complicated knowledge such as faithful, HAS{four(legs)}, hAT/2744.farmi or round
and CAUSE{food.HAS.taste}. While the first type of fea- N Rw p
Rw Rf Rfp |f | ≤ N P (f ) MAP
tures can be extracted from traditional ontologies, the lat- 1 44.11 88.18 50.79 92.66 50.02 92.66 92.66
5 86.88 87.75 91.38 92.26 99.00 56.70 89.66
ter type of characteristics can not. However, we believe that
10 93.39 93.39 95.97 95.97 100.00 32.70 90.56
the latter type of features form an important part to common
15 95.61 95.61 97.36 97.36 100.00 22.89 90.77
sense knowledge, because if people are asked to describe a 20 96.48 96.48 97.93 97.93 100.00 17.54 90.82
concept, they will rather use such characteristics. Moreover,
an inference system can also benefit from such descriptions.
It can also be seen from the example, that the model “knows” Table 4: Performance of the model for English tested on def-
that Laika is a dog by returning semantic features charac- initions in the 4lang vocabulary as a function of the num-
terizing dogs. In addition, the feature EAT.flesh emphasizes ber N of top-ranked features retrieved for each word. Rw :
the contrast of Laika being a dog and eating cucumber and Word recall (words for which all features were retrieved),
onion. Rw (poss): recall for words having no more than N fea-
Another benefit of our model, as mentioned above, is that tures, Rf : feature recall, Rf (poss): feature recall ignoring
it is able to generate features for all the words that are present features over the top N , |f | ≤ N : percentage of words hav-
in the original corpus the word embedding was built from, ing no more than N features, P (f ): feature precision, MAP:
not only for the extremely limited set of words included in mean average precision of features. Numbers are percent-
the 4lang dictionary. WordNet or other hand-made resources ages.
are limited only to the words and the classification that the
designer of the resource had in mind. Our model, in con- Language acc d-acc #F #B
trast, is able to assign features to proper names, slang words English 75.13% 90.07% 559 277
Hungarian 73.86% 88.34% 584 295
or mistyped word forms as well as long as these are repre-
sented in the corpus the word embedding model was created
from. In addition to the above example containing the dog Table 5: Performance of the model on 280 different test
name Laika, the following examples show some of the near- words for English and Hungarian. acc: feature accuracy, d-
est features for two more proper names and two slang words: acc: domain accuracy of features, #F: different features, #B:
IBM: information.IN, computer, equipment, electric, group features marked wrong at least once.
Facebook: information.ON, ABOUT.recent(events), computer
hype: fame, fun, idea, popular, surprise
numpty: bad, lazy, stupid, lack(work), dull
over the N limit for words having more than N features
A weakness of our method is that in some cases it also (Rfp ). As no definition contained more than 10 terms, Rw p
adds noise in the generated features. For example, features p
is identical to Rw and Rf is identical to Rf for N ≥ 10.
such as sleep or sing generated for the verb eat are not ones The definitions are terse and contain a minimal description
we would expect to be part of the definition of eat (even if for each word: for half of the words containing only a sin-
in a broader sense they might be related). Inappropriate fea- gle term, and for almost all words not more than 5, see
tures like this may be eliminated manually from the repre- column |f | ≤ N ). Feature precision (P (f )) apparently de-
sentations generated by the model. The model can thus also creases quickly as the number of features retrieved increases
be used as an aid in a semi-automatic semantic resource cre- if we blindly accept only terms present in the original defini-
ation/extension process proposing an initial representation tions as correct. See, however, further discussion below. The
that can be cleaned manually for applications that require last column of the table shows the mean average precision
a high-precision lexical semantic representation. Otherwise, (MAP) of features (terms) present in the original definitions.
the generated semantic features can be used in models per- In the other experiment, we selected 280 words not
forming some downstream task even without filtering out the present in the original dictionary randomly from a prede-
noise. In that case, the added semantic features may improve fined list of Hungarian words in which each word was as-
the performance of the downstream tool providing mostly signed to one of 28 semantic domains (e.g. food, vehicles,
useful features for words that otherwise would completely locations, occupations, etc.). From each domain 10 words
lack semantic representation. were chosen randomly and were translated to English. Then,
for these words, the 10 nearest features were generated and
Quantitative analysis two human annotators checked whether each feature was ad-
We also carried out two kinds of quantitative analysis of the equate for each given word. The same evaluation was per-
performance of our model. First, we checked the robustness formed for Hungarian. The agreement ratio between the an-
of the model by performing a sanity check. For each word notators was 0.798 for English and 0.734 for Hungarian ac-
present in the original 4lang dictionary, we calculated how cording to Cohen’s kappa, which is substantial in both cases.
many of the semantic features present in the original defi- The results are shown in Table 5.
nition were retrieved among the top N features returned by The table shows feature accuracy (acc: the ratio of cor-
the model (feature recall, Rf ) and the percentage of words rectly assigned features) in each domain. We also automat-
for which all features were retrieved (word recall, Rw ). The ically computed feature “domain accuracy” (d-acc): here
results are shown in Table 4 as a function of N (numbers we ignored feature assignment errors where the same fea-
are percentages). Recall was also calculated ignoring words ture was marked adequate for another test word in the same
p
having more than N features (Rw ) and discounting features domain. The number of different features that appeared in
this evaluation and the number of features marked wrong Fellbaum, C., ed. 1998. WordNet: an electronic lexical
at least once are shown in the last two columns. Note that database. MIT Press.
the feature accuracy (precision) for 10 features retrieved Kornai, A.; Ács, J.; Makrai, M.; Nemeskey, D. M.; Pajkossy,
turned out to be much higher (75.13%) than in the sanity K.; and Recski, G. 2015. Competence in lexical semantics.
check experiment (only 32.70%) even though this list con- In Proceedings of the Fourth Joint Conference on Lexical
tained words not in the original resource. The reason for and Computational Semantics, 165–175. Denver, Colorado:
this is that the model returns many features which, while Association for Computational Linguistics.
not explicitly present in the original terse definitions, cor-
rectly follow from the knowledge embodied in the feature Lehmann, J.; Isele, R.; Jakob, M.; Jentzsch, A.; Kontokostas,
model. E.g. While the definition of dog in 4lang contains D.; Mendes, P. N.; Hellmann, S.; Morsey, M.; van Kleef, P.;
only 3 terms: animal, faithful and carnivorous, the top 10 Auer, S.; and Bizer, C. 2015. DBpedia - a large-scale, multi-
features retrieved from the model also include mammal, lingual knowledge base extracted from wikipedia. Semantic
HAS{four(legs)}, hairy and companion. The sanity check Web Journal 6(2):167–195.
experiment thus grossly underestimated the precision of the Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; and
model. Dean, J. 2013. Distributed representations of words and
phrases and their compositionality. In Advances in Neural
Conclusion Information Processing Systems 26: 27th Annual Confer-
ence on Neural Information Processing Systems 2013. Pro-
We have presented an automatic method that is able to as- ceedings of a meeting held December 5-8, 2013, Lake Tahoe,
sign semantic features to words of natural language. This Nevada, United States., 3111–3119.
approach exploits the representative power of neural word Mikolov, T.; Yih, W.; and Zweig, G. 2013. Linguistic regu-
embeddings by mapping features derived from formal defi- larities in continuous space word representations. In Human
nitions of words to the vector space of the given language. Language Technologies: Conference of the North American
In addition to some illustrative examples, we have presented Chapter of the Association of Computational Linguistics,
the evaluation of the models demonstrating that the method Proceedings, June 9-14, 2013, Westin Peachtree Plaza Ho-
works with relatively high accuracy. Although there is a tel, Atlanta, Georgia, USA, 746–751.
moderate amount of noise in the set of generated features,
the method has a very high coverage, being able to process Miller, G. A. 1995. Wordnet: A lexical database for english.
proper names or non-standard words as well, which cannot COMMUNICATIONS OF THE ACM 38:39–41.
all be included in hand-made static knowledge bases. As Novák, A.; Siklósi, B.; and Oravecz, C. 2016. A New In-
such, our automatic method can be used as the base of a tegrated Open-source Morphological Analyzer for Hungar-
manually constructed resource, or can provide valuable in- ian. In Chair), N. C. C.; Choukri, K.; Declerck, T.; Goggi,
put for downstream applications, such as commonsense in- S.; Grobelnik, M.; Maegaard, B.; Mariani, J.; Mazo, H.;
ference systems. Moreno, A.; Odijk, J.; and Piperidis, S., eds., Proceedings of
the Tenth International Conference on Language Resources
Acknowledgments and Evaluation (LREC 2016). Paris, France: European Lan-
guage Resources Association (ELRA).
This research has been implemented with support provided
Novák, A. 2014. A new form of humor – mapping
by grant FK125217 of the National Research, Development
constraint-based computational morphologies to a finite-
and Innovation Office of Hungary financed under the FK17
state representation. In Chair), N. C. C.; Choukri, K.; De-
funding scheme.
clerck, T.; Loftsson, H.; Maegaard, B.; Mariani, J.; Moreno,
A.; Odijk, J.; and Piperidis, S., eds., Proceedings of the Ninth
References International Conference on Language Resources and Eval-
Baroni, M.; Dinu, G.; and Kruszewski, G. 2014. Don’t uation (LREC’14). Reykjavik, Iceland: European Language
count, predict! A systematic comparison of context-counting Resources Association (ELRA).
vs. context-predicting semantic vectors. In Proceedings of Orosz, G., and Novák, A. 2013. PurePos 2.0: a hybrid
the 52nd Annual Meeting of the Association for Computa- tool for morphological disambiguation. In Proceedings of
tional Linguistics (Volume 1: Long Papers), 238–247. Bal- the International Conference on Recent Advances in Natu-
timore, Maryland: Association for Computational Linguis- ral Language Processing (RANLP 2013), 539–545. Hissar,
tics. Bulgaria: INCOMA Ltd. Shoumen, BULGARIA.
Brown, S. W. 2008. Choosing sense distinctions for wsd: Pennington, J.; Socher, R.; and Manning, C. D. 2014. Glove:
Psycholinguistic evidence. In Proceedings of the 46th An- Global vectors for word representation. In Empirical Meth-
nual Meeting of the Association for Computational Linguis- ods in Natural Language Processing (EMNLP), 1532–1543.
tics on Human Language Technologies: Short Papers, HLT- Pereira, F.; Tishby, N.; and Lee, L. 1993. Distributional
Short ’08, 249–252. Stroudsburg, PA, USA: Association for clustering of english words. In Proceedings of the 31st An-
Computational Linguistics. nual Meeting on Association for Computational Linguistics,
Davis, E. 1990. Representations of commonsense knowl- ACL ’93, 183–190. Stroudsburg, PA, USA: Association for
edge. Morgan Kaufmann. Computational Linguistics.
Siklósi, B. 2016. Using embedding models for lexical
categorization in morphologically rich languages. In Gel-
bukh, A., ed., Computational Linguistics and Intelligent Text
Processing: 17th International Conference, CICLing 2016.
Konya, Turkey: Springer International Publishing, Cham.
Sokal, R. R., and Rohlf, F. J. 1962. The comparison of
dendrograms by objective methods. Taxon 11(2):33–40.
Summers, D. 2005. Longman Dictionary of Contemporary
English. Longman Dictionary of Contemporary English Se-
ries. Longman.
Toutanova, K.; Klein, D.; Manning, C. D.; and Singer, Y.
2003. Feature-rich part-of-speech tagging with a cyclic de-
pendency network. In Proceedings of the 2003 Conference
of the North American Chapter of the Association for Com-
putational Linguistics on Human Language Technology -
Volume 1, NAACL ’03, 173–180. Stroudsburg, PA, USA:
Association for Computational Linguistics.
van der Maaten, L., and Hinton, G. E. 2008. Visualiz-
ing high-dimensional data using t-sne. Journal of Machine
Learning Research 9:2579–2605.
Ward, J. H. 1963. Hierarchical grouping to optimize an
objective function. Journal of the American Statistical As-
sociation 58(301):236–244.