=Paper=
{{Paper
|id=Vol-2473/paper25
|storemode=property
|title=Extending Czech Thesauri using Word-formation Network
|pdfUrl=https://ceur-ws.org/Vol-2473/paper25.pdf
|volume=Vol-2473
|authors=Karolína Hořeňovská
|dblpUrl=https://dblp.org/rec/conf/itat/Horenovska19
}}
==Extending Czech Thesauri using Word-formation Network==
Extending Czech thesauri using word-formation network Karolína Hořeňovská Charles University, Faculty of Mathematics and Physics horenovska@ufal.mff.cuni.cz Abstract: In this paper, we attempt to extend existing These attempts have included aligning multilingual re- Czech thesauri by using a word-formation network, De- sources (e.g. [22]), mining the Wikipedia ([1], [24]) or the riNet. Thesauri are an important resource for synonym web in general ([10]), translating English WordNet (which retrieval / substitution generation but their lexical sparsity has been tried especially for Czech [5], even though the ex- is an issue in Czech. We discuss the properties of existing tension itself, to the best of our knowledge, is not publicly thesauri and DeriNet and propose several ways of using available), making use of word embeddings ([17], [7]) as DeriNet to extend the thesauri, such as deriving a synonym well as by employing derivational morphology ([8], [12]). of an adverb from a synonym of corresponding adjective. This paper is in its nature similar to a previous attempt We also evaluate some of our proposals. of extending Czech WordNet with derivational relations [14] which the authors claimed was successful. Unlike them, we use a publicly available source of derivations and 1 Introduction we do not limit ourselves to WordNet – we try using vari- ous thesauri and compare the outcome obtained with each A lot of effort has been invested in creating large the- of them and with their combination. We also share a more sauri, of which the best known example is probably Word- thorough evaluation of the resulting pairs. Net [11], followed by others such as FrameNet [2]. While these thesauri address English, there are many thesauri for other languages as well (there are e.g. WordNet versions 3 Existing thesauri for Arabic [4], Swedish [23], or Czech [16]). We wish to emphasize Czech WordNet since Czech is the language we We are aware of five notable Czech thesauri: currently deal with. However, those thesauri are heavily incomplete for • the most recent version of Czech WordNet [16], and some languages, including the above-mentioned Czech • a slightly divergent version of Czech WordNet [13], language. This incompletness presents a problem for vari- which lacks some synsets but contains some oth- ous NLP tasks, e.g. substitution generation as part of lexi- ers which were created to enable the lexico-semantic cal simplification (see [20] or [19] for more detail). annotation of Prague Dependency Treebank ([3]), On the other hand, for some languages (including which we refer to as WordNet (PDT); Czech), a rich word-formation network is available. We propose using such network to extend existing thesauri, • thesaurus formerly distributed as a part of office soft- i.e. to discover synonymy relations between new pairs of ware LibreOffice, words. Please note that while we target synonymy, as it is the only relation covered by all existing Czech thesauri, • Czech Wiktionary, and the approach would hold for any relation. The rest of the paper is organized as follows: we briefly • ÚFAL thesaurus, a thesaurus developed at our depart- describe existing related work (section 2), present existing ment. Czech thesauri (section 3) and describe the Czech word- Both WordNet versions explicitly utilize synsets, each formation network DeriNet (section 4). We then intro- synset represents a meaning and lists literals (words or duce several ways of combining DeriNet with thesauri to phrases) which can be used to express the meaning. produce new relations (section 5) and evaluate the most Synsets might include a definition of the meaning but few promising of them (section 6). have it filled. The last three thesauri employ synsets implicitly, either 2 Related work by assigning a word with a set of sets of synonyms (as done in LibreOffice thesaurus and Wiktionary) or by list- Since thesauri are generally incomplete, there have been ing sets of synonyms and including some words in more lots of attempts at extending them in an automated way. such sets (as done in our department thesaurus). We perform our experiments both using each of the the- Copyright c 2019 for this paper by its authors. Use permitted un- sauri individually and using a concatenated thesaurus, i.e. der Creative Commons License Attribution 4.0 International (CC BY an artificial thesaurus created by concatenating all synsets 4.0). from each of the real thesaurus. In our work, we do not make use of synsets. For each Of all nodes, 104, 563 (approx. 10 %) are isolated, i.e. word, we merge its synonyms from all synsets and pro- they are not connected with any other node. duce a set of its synonyms (despite the context, i.e. words Except for the parent and part of speech, there is no fur- which share the meaning at least in some contexts). This is ther annotation, i.e. one cannot learn for example that the partially to simplify the proof of concept, partially because derived noun is agent noun of the base verb. DeriNet for- senses in both WordNets are much more fine-grained than mat is therefore farily simple: it gives node ID, its lemma senses in other resources. and technical lemma (which contains some additional de- However, this step is in no way crucial. One could keep tails such as sense disambiguation), its part of speech (per- the synsets, and whenever we refer to retrieving synonym, haps with the above-mentioned indication of being a com- they could first retrieve the synsets and only then retrieve pound) and its parent’s ID (if the node has a parent). the words (either from specific or all synsets). We actually expect to do this in our future work. Some further statistics about the thesauri are provided in 5 Proposed thesauri extensions table 1. The concatenated line corresponds to concatenat- We propose the following principle of discovering new ing all synsets. Please note that we only work with single- word relations: word expressions (as opposed to multi-word expressions). 1. Find a non-root node A (i.e. a node which has a par- ent). 4 DeriNet 2. Get A’s parent, B. DeriNet [18], [25] is a Czech word-formation network. Its 3. Retrieve B’s synonyms using the existing thesauri. nodes are Czech lexemes, i.e. lemmata, and the nodes do not have to cover all sensesl. The authors report to have de- 4. Find all nodes C which correspond to the retrieved cided to take a rather minimalistic approach to polysemy, synonyms. and only represent a lemma with more nodes if at least one of two conditions is met: it was coincidentally derived 5. For each C, check if it has a child D which shares from two different words (could be demonstrated by verb requested features with A. proudit, which is represented as a base word, though it is 6. Declare A and D a related word pair. likely related to noun proud ’flow’, and also as a verb de- rived from udit ’to smoke’, when proudit refers to smoking This outline does not specify how to deal with the situ- something thoroughly), or the senses lead to different sets ation when more than one D exist (share given features of derived words (i.e. verb stát ’to stand, to melt away’). with A) for single C. In our experiments, we opted for The directed edges then represent the fact that one word choosing neither (i.e. skipping the whole C subtree) but is derived from the other one. The edges should be taken as one could also develop strategies to select the best D or implicative, some derivations might not be captured in De- generate more pairs for A from single C. riNet (yet). They are discovered using a variety of meth- It should be noted that due to this decision, discovering ods, including manual deduction, rule-based automated synonymous word pairs is not symmetric, that is, a word processes and machine learning; many of them were also pair might be discovered when starting from one word, but taken from the MorfFlex CZ morphological dictionary [6]. not when starting with the other one. All discovered edges are manually confirmed before being We actually suggest further constraining all of A, B, C added to the network. and D to improve the reliability of the discovered relations, By the authors’ design decision, no word is allowed to i.e. by constraining their part of speech. While part of have more than one parent, which simplifies the structure speech is the only feature available in DeriNet itself, we and could be justified by low occurence of compounds in can use e.g. MorfFlex CZ dictionary or MorphoDiTa tool Czech. Even though only one parent is allowed, recent for morphological analysis [21] to enable more features. versions of DeriNet allow for an indication of being a com- One could be tempted to only search for those non- pound in part of speech specification. root nodes A which are not covered by any thesauri, the The then current version of the network (1.7) contains reasoning being that such nodes already have their syn- 1, 027, 655 nodes, though only some of the nodes are sup- onyms in the thesauri. However, thesauri entries for indi- ported by corpus evidence (when compared to SYN v4 vidual words are often incomplete and the outlined pro- version of Czech National Corpus [9], we found out that as cess could still find new synonyms for node A, even if many as 591, 486 nodes (i.e. more than a half) do not oc- node A is present in a thesaurus. Furthemore, considering cur in the corpus). For the first version, only words which only nodes A which are not covered by thesauri could lead occured at least twice in a SYN subcorpus of Czech Na- to a decrease in number of retrieved pairs after adding a tional Corpus (and fullfiled a few other conditions) were new thesaurus as some nodes could be newly skipped. We inserted in the network; this condition does not hold for therefore do not constrain node A on its presence/absence lemmata inserted from MorfFlex CZ dictionary. in the thesauri. Thesaurus Synsets MWE SWE Pairs 1+ synonyms N A V D Department 18211 9 21382 40215 21382 46% 27% 23% 4% LibreOffice 39460 17310 32085 118243 32085 42% 26% 24% 4% WordNet (PDT) 23094 7696 17599 18015 10962 63% 9% 27% 1% WordNet 28459 10846 21275 20067 14095 65% 9% 24% 1% Wiktionary 43138 1038 40319 16657 11268 62% 20% 15% 3% Concatenated 152362 28463 69064 162709 43516 54% 22% 20% 3% Table 1: Basic statistics of thesauri: numbers of synsets, multi-/single-word expressions, unique synonym pairs (if a synonym pair can be retrieved from more than one synset, it is only counted once), words with at least 1 synonym; POS distribution for nouns, adjectives, verbs, adverbs (the distribution need not sum up to 1 because some thesauri contain also other POS such as prepositions) While we describe the process as deriving synonyms by We suggest constraining nodes A and D to be possesive using the synonymy relation of parent nodes, the parent- adjectives and nodes B and C to be nouns. child relation is not crucial. One could reword the pro- cess e.g. with finding A’s child B and C’s parent D, or with finding A’s grand-parent B (and C’s grand-child D). 5.4 Deriving verbs using verbs of opposite aspect However, by the nature of thesauri creation, we expect Thesauri differ in treating verb aspects, and often thesauri them to contain the base word rather than the derived word are not consistent even internally. Sometimes the verb of (though the direction of the derivation is sometimes am- opposite aspect is listed as synonym, sometimes both as- bigous). Longer distance relations, on the other hand, are pects form their own synset, sometimes the other aspect is more likely to introduce noise. completely missing. Having introduced the basic principle, we suggest spe- We suggest constraining nodes A and D to have match- cific approaches to relation derivation. We assume the ac- ing aspect, nodes B and C to have matching aspect, and cess to richer morphological annotation, e.g. by using the nodes A and B to have opposite aspect. MorphoDiTa tool. The issue with this suggestion is that aspect informa- tion is neither present within MorfFlex CZ dictionary nor 5.1 Deriving adverb synonyms using adjectives provided by MorpohDiTa. We believe, however, that an- Adverbs are often derived from adjectives and while ad- notation from Czech National Corpus (e.g. the before- verb ratio in thesauri is close to corpus ratio (approx. mentioned version syn 4) [9], which is enriched with as- 2.6 %-3.2 % of content words as measured using Czech pect annotation, could be used. National Corpus syn v4), we often ran into issues with them in our text simplification experiments. We suggest constraining nodes A and D to be adverbs 6 Evaluation and nodes B and C to be adjectives. We tried generating synonym pairs from adverbs using ad- 5.2 Deriving feminine forms from masculine jectives, feminine forms using masculine forms and poss- esive adjectives using nouns (see section 5 for more de- In Czech, some words come in different forms for men tail). (or males generally) and women. This in particular holds The generation procedure was carried out for each of for roles in relationships and for agent nouns, e.g. there is the thesauri individually and also for the result of concate- učitel ’teacher (man)’ and učitelka ’teacher (woman)’. nating all the thesauri together. Thesauri usually only cover the masculine variants, both Our results are reported in table 2. We report the num- because they are usually the default and because native ber of obtained pairs for each of the thesauri as well as speakers can infer the feminine variant (still, some lan- for the concatenated thesaurus. The reported numbers are guage knowledge is required, e.g. there is učitelka to učitel after symmetrization, i.e. after expanding any pair A-D but ministryně to ministr ’minister’, not *ministrka). into both A-D and D-A. The actual numbers of discovered We suggest constraining nodes A and D to be nouns pairs are usually 1.6-1.9 times greater as most pairs (but having feminine gender and nodes B and C to be nouns not all of them) are discovered in both directions. These having masculine gender. ratios seem to slightly correlate with the selected strategy (symmetrization is of greatest help when finding feminine 5.3 Deriving possesive adjectives from nouns variants). Similarly to omitting feminine forms, thesauri generally When evaluating a specific thesaurus, we can discover do not cover possesive adjectives since they can be easilly a synonym pair which is actually present in some other inferred from the corresponding noun. thesaurus. Whenever this happens, we consider such pair Department LibreOffice WordNet (PDT) WordNet Wiktionary Concatenated Adverbs via adjectives Obtained 21, 498 26, 293 731 952 318 34, 766 Confirmed 1, 241 1, 481 58 97 11 1639 Precision 0.77 when accepted by 2+ annotators 0.47 when accepted by all annotators Feminine via masculine Obtained 1, 392 2, 312 754 760 973 3, 574 Confirmed 72 89 32 36 99 129 Precision 0.82 when accepted by 2+ annotators 0.48 when accepted by all annotators Possesive adjectives via nouns Obtained 2, 292 4, 329 1, 089 1, 094 2, 818 7, 512 Confirmed 0 0 0 0 0 0 Precision 0.84 when accepted by 2+ annotators 0.61 when accepted by all annotators Table 2: Results of our synonym pair generation. We report the number of pairs obtained using the thesauri, number of pairs confirmed by existing thesauri and human-rated precision on a sample of 100 pairs. a confirmed one. We do not evaluate it further and expect vodpovědně spolehlivě dependably that the pair is correctly derived and synonymous. hanebně bezcharakterně unscrupulously For each strategy, we sampled 100 non-confirmed pairs vyzvědačka špehounka she-spy and asked 4 annotators to annotate them as either syn- čarodějnice divotvorkyně witch onym, antonym or unrelated. The annotators were of vary- surovcův krut’asův bully’s ing gender and age, though all of them have obtained a maršálkův maršálův marshal’s university degree during their life. Asking annotators to distinguish antonyms from unrelated pairs was done based Table 3: Examples of correctly discovered pairs on our informal result analysis, which revealed antonym pairs do occur. all annotators. They are listed in table 5. In some cases In 5 cases, the annotators admitted they did not know, (1, 4, 5), there is some evidence that the words can share in a few other, they noted they were not really sure. In a meaning but at least one of the words is associated with all cases, a very infrequent word was involved. We treat another meaning so strongly that the annotators probably I don’t know as unrelated when reporting precision and did not realize the meaning could be the same. inter-annotator agreement. We treat answers marked with Some pairs (2, 3, 10, 11, 13) come from a synset in not sure in the same way as unmarked. thesauri, even though we could not find any other evidence The inter-annotator agreement (Fleiss’ kappa) was 0.47 that these pairs really could share the meaning. and it slightly varied over the strategies (0.47, 0.42 and Other pairs (6, 14) occur because of insufficiencies in 0.50, respectively). These numbers might seem low but it the derivational process. While the base words are syn- is important to keep in mind that most answers were syn- onyms, the derived words are of distinct genders. These onym, hence this answer had a great probability, and there- pairs could be prevented by constraining the suggested fore any disagreement on other answers had a big impact. pairs more carefully. Examples of correctly discovered pairs (pairs annotated Case 9 is quite similar. Both words strůjce ’creator’ and as synonyms by all four annotators) are given in table 3. otec ’father’ could refer to a creator (author) and strůjkyně There were 9 pairs marked as antonyms by all annota- ’she-creator’ is a feminine variant of strůjce. However, tors (out of 28 marked as antonyms by at least one an- while the word otčina ’fatherland’ is directly derived from notator), they are all listed in table 4. In all 9 cases, the otec and is feminine, it does not in any way refer to she- pair was derived using a synonymy relation from LibreOf- father. This could be prevented with more detailed anno- fice thesaurus, when either two antonyms were suggested tations in the word-formation network. as synonyms to the same word, e.g. both tlouštík ’fatty’ There are cases (7, 8) when, despite the principle prov- and hubeňour ’thin man’ to tlust’och’ ’fatty’, or when an ing good, derived words are not really perceived synony- antonym was suggested directly as e.g. inkompatibilní mous, even though the base words could be. For example, ’incompatible’ to kompatibilní ’compatible’. While these both words kůň ’horse’ and osel ’donkey’ could be used to pairs are not synonymous, their existence should not be refer to a dumb person but their feminine variants are not used to decline the principle. On the contrary, should used in that way (even though in theory they could be). the thesaurus pairs be correctly marked as antonyms, The last case, 12, is special in many ways. The word we would correctly derive antonymous pairs using our zároveň ’at the same time, simultaneously’ is reported to method. be derived from word rovný ’straight’, which might seem Finally, there were 14 pairs annotated as unrelated by surprising. The pair is further derived from thesauri pair pokřiveně distorted-ly rovno straight povšechně in general konkrétně specifically různě diversely, differently identicky identically mladice young woman stařice old woman tlust’oška fat woman hubeňourka thin woman živelně elementally, unrestrainedly organizovaně organized-ly jemně softly, lightly pikantně spicy, zesty inkompatibilně incompatibly kompatibilně compatibly bezcitně heartlessly vřele heartily Table 4: Pairs marked as antonymous by all annotators 1 hospodáříčkův raráškův human annotators and around 80 % are perceived synony- 2 jedlice nájemnice mous by at least two of them. The erroneous word pairs 3 jezdkyně běžkyně are caused by two distinct factors. First, there are errors 4 koňův imbecilův in the thesauri synsets: unrelated, or even antonymous, 5 léčitelův věštcův words are occasionally marked as synonymous. Second, 6 nešičin amatérův there are limitations of our method, where the derived 7 oslice konice words are not synonymous, despite being derived from 8 radikálně zleva synonymous base words. 9 strůjkyně otčina Some of the limitations could be overcome by better 10 surovcův katův filtering within our method or by more detailed annota- 11 vinařka hornice tions in the word-formation network. The latter has be- 12 zároveň zakrouceně come available soon after we carried out our experiments, 13 čile umně as DeriNet 2.0 has been released. This version has a more 14 št’ouřin kutilův detailed annotation of both nodes (e.g. noun gender) and edges (the purpose of derivation is annotated, e.g. diminu- Table 5: Pairs marked as unrelated by all annotators tivization), and we expect this version to be helpful in fu- ture experiments. We also plan to try using Derivancze [15] (which also rovný ’straight’ – zakroucený ’tortuous’, which is rather includes derivation annotations) instead of DeriNet as the antonymous. word-formation network and see if it helps to improve our We do not provide detailed report on pairs annotated dif- results. ferently by different annotators, though we have examined Overall, we consider our results good because they sug- them too. In most cases, some evidence of shared meaning gest that thesauri authors can focus on capturing the rela- exist but some of the annotators did not consider the words tions between the base words and NLP applications can synonymous. still make good use of those thesauri even for derived Following from the above analysis, more than half of words. unrelated pairs is not less related that their base word coun- terparts. These pairs do not contradict our method, they only evidence the necessity of both checking thesauri qual- Acknowledgement ity and being careful about the synonymy itself as it is per- ceived differently by different people. This work has been supported by the grant No. 1704218 There are cases when our method fails to filter out non- of the Grant Agency of Charles University. It has been synonymous derived pairs. This could be improved both using language resources and tools stored and distributed by better filtering during the inference process and by hav- by the LINDAT/CLARIN project of the Ministry of Edu- ing better annotation in the word-formation network. cation, Youth and Sports of the Czech Republic (project LM2015071). The research was also partially supported by SVV project number 260 453. 7 Conclusion We have presented a method of deriving new synonym References pairs using existing thesauri and word-formation network, [1] Musa Alkhalifa and Horacio Rodríguez. Automatically ex- we have suggested several strategies to do the actual tending NE coverage of Arabic WordNet using Wikipedia. derivation and we have evaluated some of them. In Proc. Of the 3rd International Conference on Ara- Our evaluation revealed that about half of derived syn- bic Language Processing CITALA2009, Rabat, Morocco, onym pairs are really perceived synonymous by all of our 2009. [2] Collin F Baker, Charles J Fillmore, and John B Lowe. The [15] Karel Pala and Pavel Šmerk. Derivancze—derivational ana- Berkeley FrameNet project. In Proceedings of the 17th lyzer of Czech. In International conference on text, speech, international conference on Computational linguistics- and dialogue, pages 515–523. Springer, 2015. Volume 1, pages 86–90. Association for Computational [16] Karel Pala and Pavel Smrž. Building Czech WordNet. Ro- Linguistics, 1998. manian Journal of Information Science and Technology, [3] Eduard Bejček, Petra Hoffmannová, Martin Holub, Marie 7(1-2):79–88, 2004. Hučínová, Pavel Pecina, Pavel Straňák, Pavel Šidák, and [17] Heidi Sand, Erik Velldal, and Lilja Øvrelid. WordNet ex- Jan Hajič. Lexico-semantic annotation of PDT using Czech tension via word embeddings: Experiments on the Norwe- WordNet, 2011. LINDAT/CLARIN digital library at the In- gian WordNet. In Proceedings of the 21st Nordic Confer- stitute of Formal and Applied Linguistics (ÚFAL), Faculty ence on Computational Linguistics, pages 298–302, 2017. of Mathematics and Physics, Charles University. [18] Magda Ševčíková and Zdeněk Žabokrtský. Word- [4] William Black, Sabri Elkateb, and Piek Vossen. Introduc- formation network for Czech. In Proceedings of the ing the Arabic WordNet project. In In Proceedings of the Ninth International Conference on Language Resources third International WordNet Conference (GWC-06. Cite- and Evaluation (LREC-2014), 2014. seer, 2006. [19] Matthew Shardlow. A survey of automated text simplifica- [5] Marek Blahuš. Extending Czech WordNet using a bilin- tion. International Journal of Advanced Computer Science gual dictionary. Master’s thesis, Faculty of Informatics, and Applications, 4(1):58–70, 2014. Masaryk University, 2011. [20] Lucia Specia, Sujay Kumar Jauhar, and Rada Mihalcea. [6] Jan Hajič and Jaroslava Hlaváčová. MorfFlex CZ, 2013. Semeval-2012 task 1: English lexical simplification. In LINDAT/CLARIN digital library at the Institute of Formal Proceedings of the First Joint Conference on Lexical and and Applied Linguistics (ÚFAL), Faculty of Mathematics Computational Semantics-Volume 1: Proceedings of the and Physics, Charles University. main conference and the shared task, and Volume 2: Pro- [7] Jugal Kalita et al. Enhancing automatic WordNet construc- ceedings of the Sixth International Workshop on Semantic tion using word embeddings. In Proceedings of the Work- Evaluation, pages 347–355. Association for Computational shop on Multilingual and Cross-lingual Methods in NLP, Linguistics, 2012. pages 30–34, 2016. [21] Jana Straková, Milan Straka, and Jan Hajič. Open-Source [8] Svetla Koeva, Cvetana Krstev, and Duško Vitas. Morpho- Tools for Morphology, Lemmatization, POS Tagging and semantic relations in WordNet–a case study for two Slavic Named Entity Recognition. In Proceedings of 52nd An- languages. In Global wordnet conference, pages 239–253. nual Meeting of the Association for Computational Lin- University of Szeged, Department of Informatics, 2008. guistics: System Demonstrations, pages 13–18, Baltimore, [9] Michal Křen, Václav Cvrček, Tomáš Čapka, Anna Čer- Maryland, June 2014. Association for Computational Lin- máková, Milena Hnátková, Lucie Chlumská, Tomáš guistics. Jelínek, Dominika Kováříková, Vladimír Petkevič, Pavel [22] Lonneke Van der Plas and Jörg Tiedemann. Finding Procházka, Hana Skoumalová, Michal Škrabal, Petr synonyms using automatic word alignment and measures Truneček, Pavel Vondřička, and Adrian Zasina. SYN v4: of distributional similarity. In Proceedings of the COL- large corpus of written Czech, 2016. LINDAT/CLARIN ING/ACL on Main conference poster sessions, pages 866– digital library at the Institute of Formal and Applied Lin- 873. Association for Computational Linguistics, 2006. guistics (ÚFAL), Faculty of Mathematics and Physics, [23] Ake Viberg, Kerstin Lindmark, Ann Lindvall, and Ingmarie Charles University. Mellenius. The Swedish WordNet project. In Proceedings [10] Robert Meusel, Mathias Niepert, Kai Eckert, and Heiner of the Tenth EURALEX International Congress, EURALEX Stuckenschmidt. Thesaurus extension using web search 2002: Copenhagen, Denmark, August 13-17, 2002, pages engines. In International Conference on Asian Digital Li- 407–412, 2003. braries, pages 198–207. Springer, 2010. [24] Ichiro Yamada, Jong-Hoon Oh, Chikara Hashimoto, Ken- [11] George A Miller. Wordnet: a lexical database for English. taro Torisawa, Jun’ichi Kazama, Stijn De Saeger, and Communications of the ACM, 38(11):39–41, 1995. Takuya Kawada. Extending wordnet with hypernyms and [12] Verginica Barbu Mititelu. Adding morpho-semantic rela- siblings acquired from Wikipedia. In Proceedings of 5th tions to the Romanian WordNet. In LREC, pages 2596– International Joint Conference on Natural Language Pro- 2601, 2012. cessing, pages 874–882, 2011. [13] Karel Pala, Tomáš Čapek, Barbora Zajíčková, Dita [25] Zdeněk Žabokrtský, Magda Ševčíková, Milan Straka, Jonáš Bartůšková, Kateřina Kulková, Petra Hoffmannová, Ed- Vidra, and Adéla Limburská. Merging data resources uard Bejček, Pavel Straňák, and Jan Hajič. Czech WordNet for inflectional and derivational morphology in Czech. 1.9 PDT, 2011. LINDAT/CLARIN digital library at the In- In Proceedings of the Tenth International Conference on stitute of Formal and Applied Linguistics (ÚFAL), Faculty Language Resources and Evaluation (LREC 2016), pages of Mathematics and Physics, Charles University. 1307–1314, 2016. [14] Karel Pala and Dana Hlaváčková. Derivational relations in Czech WordNet. In Proceedings of the workshop on balto- slavonic natural language processing: Information extrac- tion and enabling technologies, pages 75–81. Association for Computational Linguistics, 2007.