=Paper=
{{Paper
|id=Vol-2473/paper25
|storemode=property
|title=Extending Czech Thesauri using Word-formation Network
|pdfUrl=https://ceur-ws.org/Vol-2473/paper25.pdf
|volume=Vol-2473
|authors=Karolína Hořeňovská
|dblpUrl=https://dblp.org/rec/conf/itat/Horenovska19
}}
==Extending Czech Thesauri using Word-formation Network==
<pdf width="1500px">https://ceur-ws.org/Vol-2473/paper25.pdf</pdf>
<pre>
                    Extending Czech thesauri using word-formation network

                                                            Karolína Hořeňovská

                                           Charles University, Faculty of Mathematics and Physics
                                                    horenovska@ufal.mff.cuni.cz

Abstract: In this paper, we attempt to extend existing                    These attempts have included aligning multilingual re-
Czech thesauri by using a word-formation network, De-                     sources (e.g. [22]), mining the Wikipedia ([1], [24]) or the
riNet. Thesauri are an important resource for synonym                     web in general ([10]), translating English WordNet (which
retrieval / substitution generation but their lexical sparsity            has been tried especially for Czech [5], even though the ex-
is an issue in Czech. We discuss the properties of existing               tension itself, to the best of our knowledge, is not publicly
thesauri and DeriNet and propose several ways of using                    available), making use of word embeddings ([17], [7]) as
DeriNet to extend the thesauri, such as deriving a synonym                well as by employing derivational morphology ([8], [12]).
of an adverb from a synonym of corresponding adjective.                      This paper is in its nature similar to a previous attempt
We also evaluate some of our proposals.                                   of extending Czech WordNet with derivational relations
                                                                          [14] which the authors claimed was successful. Unlike
                                                                          them, we use a publicly available source of derivations and
1    Introduction                                                         we do not limit ourselves to WordNet – we try using vari-
                                                                          ous thesauri and compare the outcome obtained with each
A lot of effort has been invested in creating large the-                  of them and with their combination. We also share a more
sauri, of which the best known example is probably Word-                  thorough evaluation of the resulting pairs.
Net [11], followed by others such as FrameNet [2]. While
these thesauri address English, there are many thesauri for
other languages as well (there are e.g. WordNet versions                  3     Existing thesauri
for Arabic [4], Swedish [23], or Czech [16]). We wish to
emphasize Czech WordNet since Czech is the language we                    We are aware of five notable Czech thesauri:
currently deal with.
   However, those thesauri are heavily incomplete for                         • the most recent version of Czech WordNet [16], and
some languages, including the above-mentioned Czech
                                                                              • a slightly divergent version of Czech WordNet [13],
language. This incompletness presents a problem for vari-
                                                                                which lacks some synsets but contains some oth-
ous NLP tasks, e.g. substitution generation as part of lexi-
                                                                                ers which were created to enable the lexico-semantic
cal simplification (see [20] or [19] for more detail).
                                                                                annotation of Prague Dependency Treebank ([3]),
   On the other hand, for some languages (including
                                                                                which we refer to as WordNet (PDT);
Czech), a rich word-formation network is available. We
propose using such network to extend existing thesauri,                       • thesaurus formerly distributed as a part of office soft-
i.e. to discover synonymy relations between new pairs of                        ware LibreOffice,
words. Please note that while we target synonymy, as it
is the only relation covered by all existing Czech thesauri,                  • Czech Wiktionary, and
the approach would hold for any relation.
   The rest of the paper is organized as follows: we briefly                  • ÚFAL thesaurus, a thesaurus developed at our depart-
describe existing related work (section 2), present existing                    ment.
Czech thesauri (section 3) and describe the Czech word-                      Both WordNet versions explicitly utilize synsets, each
formation network DeriNet (section 4). We then intro-                     synset represents a meaning and lists literals (words or
duce several ways of combining DeriNet with thesauri to                   phrases) which can be used to express the meaning.
produce new relations (section 5) and evaluate the most                   Synsets might include a definition of the meaning but few
promising of them (section 6).                                            have it filled.
                                                                             The last three thesauri employ synsets implicitly, either
2    Related work                                                         by assigning a word with a set of sets of synonyms (as
                                                                          done in LibreOffice thesaurus and Wiktionary) or by list-
Since thesauri are generally incomplete, there have been                  ing sets of synonyms and including some words in more
lots of attempts at extending them in an automated way.                   such sets (as done in our department thesaurus).
                                                                             We perform our experiments both using each of the the-
      Copyright c 2019 for this paper by its authors. Use permitted un-
                                                                          sauri individually and using a concatenated thesaurus, i.e.
der Creative Commons License Attribution 4.0 International (CC BY         an artificial thesaurus created by concatenating all synsets
4.0).                                                                     from each of the real thesaurus.
   In our work, we do not make use of synsets. For each            Of all nodes, 104, 563 (approx. 10 %) are isolated, i.e.
word, we merge its synonyms from all synsets and pro-           they are not connected with any other node.
duce a set of its synonyms (despite the context, i.e. words        Except for the parent and part of speech, there is no fur-
which share the meaning at least in some contexts). This is     ther annotation, i.e. one cannot learn for example that the
partially to simplify the proof of concept, partially because   derived noun is agent noun of the base verb. DeriNet for-
senses in both WordNets are much more fine-grained than         mat is therefore farily simple: it gives node ID, its lemma
senses in other resources.                                      and technical lemma (which contains some additional de-
   However, this step is in no way crucial. One could keep      tails such as sense disambiguation), its part of speech (per-
the synsets, and whenever we refer to retrieving synonym,       haps with the above-mentioned indication of being a com-
they could first retrieve the synsets and only then retrieve    pound) and its parent’s ID (if the node has a parent).
the words (either from specific or all synsets). We actually
expect to do this in our future work.
   Some further statistics about the thesauri are provided in
                                                                5     Proposed thesauri extensions
table 1. The concatenated line corresponds to concatenat-
                                                                We propose the following principle of discovering new
ing all synsets. Please note that we only work with single-
                                                                word relations:
word expressions (as opposed to multi-word expressions).
                                                                    1. Find a non-root node A (i.e. a node which has a par-
                                                                       ent).
4   DeriNet
                                                                    2. Get A’s parent, B.
DeriNet [18], [25] is a Czech word-formation network. Its
                                                                    3. Retrieve B’s synonyms using the existing thesauri.
nodes are Czech lexemes, i.e. lemmata, and the nodes do
not have to cover all sensesl. The authors report to have de-       4. Find all nodes C which correspond to the retrieved
cided to take a rather minimalistic approach to polysemy,              synonyms.
and only represent a lemma with more nodes if at least
one of two conditions is met: it was coincidentally derived         5. For each C, check if it has a child D which shares
from two different words (could be demonstrated by verb                requested features with A.
proudit, which is represented as a base word, though it is
                                                                    6. Declare A and D a related word pair.
likely related to noun proud ’flow’, and also as a verb de-
rived from udit ’to smoke’, when proudit refers to smoking         This outline does not specify how to deal with the situ-
something thoroughly), or the senses lead to different sets     ation when more than one D exist (share given features
of derived words (i.e. verb stát ’to stand, to melt away’).     with A) for single C. In our experiments, we opted for
   The directed edges then represent the fact that one word     choosing neither (i.e. skipping the whole C subtree) but
is derived from the other one. The edges should be taken as     one could also develop strategies to select the best D or
implicative, some derivations might not be captured in De-      generate more pairs for A from single C.
riNet (yet). They are discovered using a variety of meth-          It should be noted that due to this decision, discovering
ods, including manual deduction, rule-based automated           synonymous word pairs is not symmetric, that is, a word
processes and machine learning; many of them were also          pair might be discovered when starting from one word, but
taken from the MorfFlex CZ morphological dictionary [6].        not when starting with the other one.
All discovered edges are manually confirmed before being           We actually suggest further constraining all of A, B, C
added to the network.                                           and D to improve the reliability of the discovered relations,
   By the authors’ design decision, no word is allowed to       i.e. by constraining their part of speech. While part of
have more than one parent, which simplifies the structure       speech is the only feature available in DeriNet itself, we
and could be justified by low occurence of compounds in         can use e.g. MorfFlex CZ dictionary or MorphoDiTa tool
Czech. Even though only one parent is allowed, recent           for morphological analysis [21] to enable more features.
versions of DeriNet allow for an indication of being a com-        One could be tempted to only search for those non-
pound in part of speech specification.                          root nodes A which are not covered by any thesauri, the
   The then current version of the network (1.7) contains       reasoning being that such nodes already have their syn-
1, 027, 655 nodes, though only some of the nodes are sup-       onyms in the thesauri. However, thesauri entries for indi-
ported by corpus evidence (when compared to SYN v4              vidual words are often incomplete and the outlined pro-
version of Czech National Corpus [9], we found out that as      cess could still find new synonyms for node A, even if
many as 591, 486 nodes (i.e. more than a half) do not oc-       node A is present in a thesaurus. Furthemore, considering
cur in the corpus). For the first version, only words which     only nodes A which are not covered by thesauri could lead
occured at least twice in a SYN subcorpus of Czech Na-          to a decrease in number of retrieved pairs after adding a
tional Corpus (and fullfiled a few other conditions) were       new thesaurus as some nodes could be newly skipped. We
inserted in the network; this condition does not hold for       therefore do not constrain node A on its presence/absence
lemmata inserted from MorfFlex CZ dictionary.                   in the thesauri.
            Thesaurus          Synsets     MWE       SWE            Pairs   1+ synonyms       N        A      V       D
            Department          18211         9      21382        40215        21382         46%     27%     23%     4%
            LibreOffice         39460      17310     32085        118243       32085         42%     26%     24%     4%
          WordNet (PDT)         23094       7696     17599         18015       10962         63%      9%     27%     1%
             WordNet            28459      10846     21275        20067        14095         65%      9%     24%     1%
            Wiktionary          43138       1038     40319        16657        11268         62%     20%     15%     3%
           Concatenated        152362      28463     69064        162709       43516         54%     22%     20%     3%

Table 1: Basic statistics of thesauri: numbers of synsets, multi-/single-word expressions, unique synonym pairs (if
a synonym pair can be retrieved from more than one synset, it is only counted once), words with at least 1 synonym; POS
distribution for nouns, adjectives, verbs, adverbs (the distribution need not sum up to 1 because some thesauri contain also
other POS such as prepositions)


   While we describe the process as deriving synonyms by                We suggest constraining nodes A and D to be possesive
using the synonymy relation of parent nodes, the parent-              adjectives and nodes B and C to be nouns.
child relation is not crucial. One could reword the pro-
cess e.g. with finding A’s child B and C’s parent D, or
with finding A’s grand-parent B (and C’s grand-child D).              5.4   Deriving verbs using verbs of opposite aspect
However, by the nature of thesauri creation, we expect
                                                                      Thesauri differ in treating verb aspects, and often thesauri
them to contain the base word rather than the derived word
                                                                      are not consistent even internally. Sometimes the verb of
(though the direction of the derivation is sometimes am-
                                                                      opposite aspect is listed as synonym, sometimes both as-
bigous). Longer distance relations, on the other hand, are
                                                                      pects form their own synset, sometimes the other aspect is
more likely to introduce noise.
                                                                      completely missing.
   Having introduced the basic principle, we suggest spe-
                                                                         We suggest constraining nodes A and D to have match-
cific approaches to relation derivation. We assume the ac-
                                                                      ing aspect, nodes B and C to have matching aspect, and
cess to richer morphological annotation, e.g. by using the
                                                                      nodes A and B to have opposite aspect.
MorphoDiTa tool.
                                                                         The issue with this suggestion is that aspect informa-
                                                                      tion is neither present within MorfFlex CZ dictionary nor
5.1   Deriving adverb synonyms using adjectives                       provided by MorpohDiTa. We believe, however, that an-
Adverbs are often derived from adjectives and while ad-               notation from Czech National Corpus (e.g. the before-
verb ratio in thesauri is close to corpus ratio (approx.              mentioned version syn 4) [9], which is enriched with as-
2.6 %-3.2 % of content words as measured using Czech                  pect annotation, could be used.
National Corpus syn v4), we often ran into issues with
them in our text simplification experiments.
  We suggest constraining nodes A and D to be adverbs                 6     Evaluation
and nodes B and C to be adjectives.
                                                                      We tried generating synonym pairs from adverbs using ad-
5.2   Deriving feminine forms from masculine                          jectives, feminine forms using masculine forms and poss-
                                                                      esive adjectives using nouns (see section 5 for more de-
In Czech, some words come in different forms for men
                                                                      tail).
(or males generally) and women. This in particular holds
                                                                         The generation procedure was carried out for each of
for roles in relationships and for agent nouns, e.g. there is
                                                                      the thesauri individually and also for the result of concate-
učitel ’teacher (man)’ and učitelka ’teacher (woman)’.
                                                                      nating all the thesauri together.
   Thesauri usually only cover the masculine variants, both
                                                                         Our results are reported in table 2. We report the num-
because they are usually the default and because native
                                                                      ber of obtained pairs for each of the thesauri as well as
speakers can infer the feminine variant (still, some lan-
                                                                      for the concatenated thesaurus. The reported numbers are
guage knowledge is required, e.g. there is učitelka to učitel
                                                                      after symmetrization, i.e. after expanding any pair A-D
but ministryně to ministr ’minister’, not *ministrka).
                                                                      into both A-D and D-A. The actual numbers of discovered
   We suggest constraining nodes A and D to be nouns
                                                                      pairs are usually 1.6-1.9 times greater as most pairs (but
having feminine gender and nodes B and C to be nouns
                                                                      not all of them) are discovered in both directions. These
having masculine gender.
                                                                      ratios seem to slightly correlate with the selected strategy
                                                                      (symmetrization is of greatest help when finding feminine
5.3   Deriving possesive adjectives from nouns                        variants).
Similarly to omitting feminine forms, thesauri generally                 When evaluating a specific thesaurus, we can discover
do not cover possesive adjectives since they can be easilly           a synonym pair which is actually present in some other
inferred from the corresponding noun.                                 thesaurus. Whenever this happens, we consider such pair
                        Department   LibreOffice WordNet (PDT) WordNet Wiktionary Concatenated
                                                 Adverbs via adjectives
         Obtained         21, 498       26, 293            731             952         318           34, 766
         Confirmed        1, 241         1, 481             58              97          11            1639
         Precision          0.77 when accepted by 2+ annotators           0.47 when accepted by all annotators
                                                 Feminine via masculine
         Obtained          1, 392        2, 312            754             760         973            3, 574
         Confirmed           72            89               32              36          99             129
         Precision          0.82 when accepted by 2+ annotators           0.48 when accepted by all annotators
                                              Possesive adjectives via nouns
         Obtained          2, 292        4, 329           1, 089          1, 094      2, 818          7, 512
         Confirmed            0             0                0               0           0               0
         Precision          0.84 when accepted by 2+ annotators           0.61 when accepted by all annotators

Table 2: Results of our synonym pair generation. We report the number of pairs obtained using the thesauri, number of
pairs confirmed by existing thesauri and human-rated precision on a sample of 100 pairs.


a confirmed one. We do not evaluate it further and expect            vodpovědně      spolehlivě         dependably
that the pair is correctly derived and synonymous.                     hanebně      bezcharakterně     unscrupulously
   For each strategy, we sampled 100 non-confirmed pairs             vyzvědačka      špehounka             she-spy
and asked 4 annotators to annotate them as either syn-               čarodějnice    divotvorkyně           witch
onym, antonym or unrelated. The annotators were of vary-               surovcův        krut’asův           bully’s
ing gender and age, though all of them have obtained a                maršálkův        maršálův           marshal’s
university degree during their life. Asking annotators to
distinguish antonyms from unrelated pairs was done based             Table 3: Examples of correctly discovered pairs
on our informal result analysis, which revealed antonym
pairs do occur.                                                 all annotators. They are listed in table 5. In some cases
   In 5 cases, the annotators admitted they did not know,       (1, 4, 5), there is some evidence that the words can share
in a few other, they noted they were not really sure. In        a meaning but at least one of the words is associated with
all cases, a very infrequent word was involved. We treat        another meaning so strongly that the annotators probably
I don’t know as unrelated when reporting precision and          did not realize the meaning could be the same.
inter-annotator agreement. We treat answers marked with            Some pairs (2, 3, 10, 11, 13) come from a synset in
not sure in the same way as unmarked.                           thesauri, even though we could not find any other evidence
   The inter-annotator agreement (Fleiss’ kappa) was 0.47       that these pairs really could share the meaning.
and it slightly varied over the strategies (0.47, 0.42 and         Other pairs (6, 14) occur because of insufficiencies in
0.50, respectively). These numbers might seem low but it        the derivational process. While the base words are syn-
is important to keep in mind that most answers were syn-        onyms, the derived words are of distinct genders. These
onym, hence this answer had a great probability, and there-     pairs could be prevented by constraining the suggested
fore any disagreement on other answers had a big impact.        pairs more carefully.
   Examples of correctly discovered pairs (pairs annotated         Case 9 is quite similar. Both words strůjce ’creator’ and
as synonyms by all four annotators) are given in table 3.       otec ’father’ could refer to a creator (author) and strůjkyně
There were 9 pairs marked as antonyms by all annota-            ’she-creator’ is a feminine variant of strůjce. However,
tors (out of 28 marked as antonyms by at least one an-          while the word otčina ’fatherland’ is directly derived from
notator), they are all listed in table 4. In all 9 cases, the   otec and is feminine, it does not in any way refer to she-
pair was derived using a synonymy relation from LibreOf-        father. This could be prevented with more detailed anno-
fice thesaurus, when either two antonyms were suggested         tations in the word-formation network.
as synonyms to the same word, e.g. both tlouštík ’fatty’           There are cases (7, 8) when, despite the principle prov-
and hubeňour ’thin man’ to tlust’och’ ’fatty’, or when an      ing good, derived words are not really perceived synony-
antonym was suggested directly as e.g. inkompatibilní           mous, even though the base words could be. For example,
’incompatible’ to kompatibilní ’compatible’. While these        both words kůň ’horse’ and osel ’donkey’ could be used to
pairs are not synonymous, their existence should not be         refer to a dumb person but their feminine variants are not
used to decline the principle. On the contrary, should          used in that way (even though in theory they could be).
the thesaurus pairs be correctly marked as antonyms,               The last case, 12, is special in many ways. The word
we would correctly derive antonymous pairs using our            zároveň ’at the same time, simultaneously’ is reported to
method.                                                         be derived from word rovný ’straight’, which might seem
   Finally, there were 14 pairs annotated as unrelated by       surprising. The pair is further derived from thesauri pair
                          pokřiveně               distorted-ly               rovno          straight
                         povšechně                  in general             konkrétně     specifically
                             různě           diversely, differently       identicky       identically
                           mladice                young woman                 stařice      old woman
                          tlust’oška                fat woman              hubeňourka     thin woman
                            živelně        elementally, unrestrainedly   organizovaně    organized-ly
                             jemně                softly, lightly           pikantně     spicy, zesty
                       inkompatibilně             incompatibly           kompatibilně     compatibly
                           bezcitně                heartlessly                vřele         heartily

                                     Table 4: Pairs marked as antonymous by all annotators


             1    hospodáříčkův       raráškův                 human annotators and around 80 % are perceived synony-
             2         jedlice          nájemnice                  mous by at least two of them. The erroneous word pairs
             3       jezdkyně            běžkyně                are caused by two distinct factors. First, there are errors
             4          koňův         imbecilův                 in the thesauri synsets: unrelated, or even antonymous,
             5       léčitelův          věštcův                words are occasionally marked as synonymous. Second,
             6        nešičin          amatérův                  there are limitations of our method, where the derived
             7          oslice             konice                  words are not synonymous, despite being derived from
             8      radikálně              zleva                  synonymous base words.
             9      strůjkyně            otčina                     Some of the limitations could be overcome by better
            10      surovcův              katův                  filtering within our method or by more detailed annota-
            11        vinařka            hornice                  tions in the word-formation network. The latter has be-
            12        zároveň         zakrouceně                 come available soon after we carried out our experiments,
            13           čile             umně                   as DeriNet 2.0 has been released. This version has a more
            14        št’ouřin           kutilův                 detailed annotation of both nodes (e.g. noun gender) and
                                                                   edges (the purpose of derivation is annotated, e.g. diminu-
    Table 5: Pairs marked as unrelated by all annotators           tivization), and we expect this version to be helpful in fu-
                                                                   ture experiments.
                                                                       We also plan to try using Derivancze [15] (which also
rovný ’straight’ – zakroucený ’tortuous’, which is rather
                                                                   includes derivation annotations) instead of DeriNet as the
antonymous.
                                                                   word-formation network and see if it helps to improve our
   We do not provide detailed report on pairs annotated dif-
                                                                   results.
ferently by different annotators, though we have examined
                                                                       Overall, we consider our results good because they sug-
them too. In most cases, some evidence of shared meaning
                                                                   gest that thesauri authors can focus on capturing the rela-
exist but some of the annotators did not consider the words
                                                                   tions between the base words and NLP applications can
synonymous.
                                                                   still make good use of those thesauri even for derived
   Following from the above analysis, more than half of
                                                                   words.
unrelated pairs is not less related that their base word coun-
terparts. These pairs do not contradict our method, they
only evidence the necessity of both checking thesauri qual-        Acknowledgement
ity and being careful about the synonymy itself as it is per-
ceived differently by different people.                            This work has been supported by the grant No. 1704218
   There are cases when our method fails to filter out non-        of the Grant Agency of Charles University. It has been
synonymous derived pairs. This could be improved both              using language resources and tools stored and distributed
by better filtering during the inference process and by hav-       by the LINDAT/CLARIN project of the Ministry of Edu-
ing better annotation in the word-formation network.               cation, Youth and Sports of the Czech Republic (project
                                                                   LM2015071). The research was also partially supported
                                                                   by SVV project number 260 453.
7    Conclusion

We have presented a method of deriving new synonym
                                                                   References
pairs using existing thesauri and word-formation network,           [1] Musa Alkhalifa and Horacio Rodríguez. Automatically ex-
we have suggested several strategies to do the actual                   tending NE coverage of Arabic WordNet using Wikipedia.
derivation and we have evaluated some of them.                          In Proc. Of the 3rd International Conference on Ara-
  Our evaluation revealed that about half of derived syn-               bic Language Processing CITALA2009, Rabat, Morocco,
onym pairs are really perceived synonymous by all of our                2009.
 [2] Collin F Baker, Charles J Fillmore, and John B Lowe. The      [15] Karel Pala and Pavel Šmerk. Derivancze—derivational ana-
     Berkeley FrameNet project. In Proceedings of the 17th              lyzer of Czech. In International conference on text, speech,
     international conference on Computational linguistics-             and dialogue, pages 515–523. Springer, 2015.
     Volume 1, pages 86–90. Association for Computational          [16] Karel Pala and Pavel Smrž. Building Czech WordNet. Ro-
     Linguistics, 1998.                                                 manian Journal of Information Science and Technology,
 [3] Eduard Bejček, Petra Hoffmannová, Martin Holub, Marie             7(1-2):79–88, 2004.
     Hučínová, Pavel Pecina, Pavel Straňák, Pavel Šidák, and     [17] Heidi Sand, Erik Velldal, and Lilja Øvrelid. WordNet ex-
     Jan Hajič. Lexico-semantic annotation of PDT using Czech          tension via word embeddings: Experiments on the Norwe-
     WordNet, 2011. LINDAT/CLARIN digital library at the In-            gian WordNet. In Proceedings of the 21st Nordic Confer-
     stitute of Formal and Applied Linguistics (ÚFAL), Faculty          ence on Computational Linguistics, pages 298–302, 2017.
     of Mathematics and Physics, Charles University.               [18] Magda Ševčíková and Zdeněk Žabokrtský.              Word-
 [4] William Black, Sabri Elkateb, and Piek Vossen. Introduc-           formation network for Czech. In Proceedings of the
     ing the Arabic WordNet project. In In Proceedings of the           Ninth International Conference on Language Resources
     third International WordNet Conference (GWC-06. Cite-              and Evaluation (LREC-2014), 2014.
     seer, 2006.                                                   [19] Matthew Shardlow. A survey of automated text simplifica-
 [5] Marek Blahuš. Extending Czech WordNet using a bilin-               tion. International Journal of Advanced Computer Science
     gual dictionary. Master’s thesis, Faculty of Informatics,          and Applications, 4(1):58–70, 2014.
     Masaryk University, 2011.                                     [20] Lucia Specia, Sujay Kumar Jauhar, and Rada Mihalcea.
 [6] Jan Hajič and Jaroslava Hlaváčová. MorfFlex CZ, 2013.            Semeval-2012 task 1: English lexical simplification. In
     LINDAT/CLARIN digital library at the Institute of Formal           Proceedings of the First Joint Conference on Lexical and
     and Applied Linguistics (ÚFAL), Faculty of Mathematics             Computational Semantics-Volume 1: Proceedings of the
     and Physics, Charles University.                                   main conference and the shared task, and Volume 2: Pro-
 [7] Jugal Kalita et al. Enhancing automatic WordNet construc-          ceedings of the Sixth International Workshop on Semantic
     tion using word embeddings. In Proceedings of the Work-            Evaluation, pages 347–355. Association for Computational
     shop on Multilingual and Cross-lingual Methods in NLP,             Linguistics, 2012.
     pages 30–34, 2016.                                            [21] Jana Straková, Milan Straka, and Jan Hajič. Open-Source
 [8] Svetla Koeva, Cvetana Krstev, and Duško Vitas. Morpho-             Tools for Morphology, Lemmatization, POS Tagging and
     semantic relations in WordNet–a case study for two Slavic          Named Entity Recognition. In Proceedings of 52nd An-
     languages. In Global wordnet conference, pages 239–253.            nual Meeting of the Association for Computational Lin-
     University of Szeged, Department of Informatics, 2008.             guistics: System Demonstrations, pages 13–18, Baltimore,
 [9] Michal Křen, Václav Cvrček, Tomáš Čapka, Anna Čer-             Maryland, June 2014. Association for Computational Lin-
     máková, Milena Hnátková, Lucie Chlumská, Tomáš                     guistics.
     Jelínek, Dominika Kováříková, Vladimír Petkevič, Pavel      [22] Lonneke Van der Plas and Jörg Tiedemann. Finding
     Procházka, Hana Skoumalová, Michal Škrabal, Petr                   synonyms using automatic word alignment and measures
     Truneček, Pavel Vondřička, and Adrian Zasina. SYN v4:           of distributional similarity. In Proceedings of the COL-
     large corpus of written Czech, 2016. LINDAT/CLARIN                 ING/ACL on Main conference poster sessions, pages 866–
     digital library at the Institute of Formal and Applied Lin-        873. Association for Computational Linguistics, 2006.
     guistics (ÚFAL), Faculty of Mathematics and Physics,          [23] Ake Viberg, Kerstin Lindmark, Ann Lindvall, and Ingmarie
     Charles University.                                                Mellenius. The Swedish WordNet project. In Proceedings
[10] Robert Meusel, Mathias Niepert, Kai Eckert, and Heiner             of the Tenth EURALEX International Congress, EURALEX
     Stuckenschmidt. Thesaurus extension using web search               2002: Copenhagen, Denmark, August 13-17, 2002, pages
     engines. In International Conference on Asian Digital Li-          407–412, 2003.
     braries, pages 198–207. Springer, 2010.                       [24] Ichiro Yamada, Jong-Hoon Oh, Chikara Hashimoto, Ken-
[11] George A Miller. Wordnet: a lexical database for English.          taro Torisawa, Jun’ichi Kazama, Stijn De Saeger, and
     Communications of the ACM, 38(11):39–41, 1995.                     Takuya Kawada. Extending wordnet with hypernyms and
[12] Verginica Barbu Mititelu. Adding morpho-semantic rela-             siblings acquired from Wikipedia. In Proceedings of 5th
     tions to the Romanian WordNet. In LREC, pages 2596–                International Joint Conference on Natural Language Pro-
     2601, 2012.                                                        cessing, pages 874–882, 2011.
[13] Karel Pala, Tomáš Čapek, Barbora Zajíčková, Dita            [25] Zdeněk Žabokrtský, Magda Ševčíková, Milan Straka, Jonáš
     Bartůšková, Kateřina Kulková, Petra Hoffmannová, Ed-             Vidra, and Adéla Limburská. Merging data resources
     uard Bejček, Pavel Straňák, and Jan Hajič. Czech WordNet        for inflectional and derivational morphology in Czech.
     1.9 PDT, 2011. LINDAT/CLARIN digital library at the In-            In Proceedings of the Tenth International Conference on
     stitute of Formal and Applied Linguistics (ÚFAL), Faculty          Language Resources and Evaluation (LREC 2016), pages
     of Mathematics and Physics, Charles University.                    1307–1314, 2016.
[14] Karel Pala and Dana Hlaváčková. Derivational relations in
     Czech WordNet. In Proceedings of the workshop on balto-
     slavonic natural language processing: Information extrac-
     tion and enabling technologies, pages 75–81. Association
     for Computational Linguistics, 2007.

</pre>