=Paper= {{Paper |id=Vol-3033/paper18 |storemode=property |title=A Methodology for Large-Scale, Disambiguated and Unbiased Lexical Knowledge Acquisition Based on Multilingual Word Alignment |pdfUrl=https://ceur-ws.org/Vol-3033/paper18.pdf |volume=Vol-3033 |authors=Francesca Grasso,Luigi Di Caro |dblpUrl=https://dblp.org/rec/conf/clic-it/GrassoC21 }} ==A Methodology for Large-Scale, Disambiguated and Unbiased Lexical Knowledge Acquisition Based on Multilingual Word Alignment== https://ceur-ws.org/Vol-3033/paper18.pdf
    A Methodology for Large-Scale, Disambiguated and Unbiased Lexical
      Knowledge Acquisition Based on Multilingual Word Alignment

                                    Francesca Grasso, Luigi Di Caro
                           University of Turin, Department of Computer Science
                             {fr.grasso,luigi.dicaro}@unito.it



                        Abstract                                therefore not represented due to the absence of
                                                                syntagmatic links. Additionally, word senses suf-
    In order to be concretely effective, many
                                                                fer from a lack of explicit common-sense knowl-
    NLP applications require the availabil-
                                                                edge and context-dependent information. Finally,
    ity of lexical resources providing varied,
                                                                the well-known fine granularity of word senses in
    broadly shared, and language-unbounded
                                                                WordNet (Palmer et al., 2007) is due to the lack
    lexical information. However, state-of-
                                                                of a meaning encoding system capable of repre-
    the-art knowledge models rarely adopt
                                                                senting concepts in a flexible way. Other kinds of
    such a comprehensive and cross-lingual
                                                                resources such as FrameNet (Baker et al., 1998)
    approach to semantics. In this paper,
                                                                and ConceptNet (Speer et al., 2017) present the
    we propose a novel automatable method-
                                                                same issue, while returning different types and de-
    ology for knowledge modeling based on
                                                                grees of structural semantic information and dis-
    a multilingual word alignment mecha-
                                                                ambiguation capabilities.
    nism that enhances the encoding of unbi-
                                                                   In this contribution, we provide a novel method-
    ased and naturally disambiguated lexical
                                                                ology for the retrieval and representation of un-
    knowledge. Results from a simple imple-
                                                                biased and naturally disambiguated lexical infor-
    mentation of the proposal show relevant
                                                                mation that relies on a multilingual word align-
    outcomes that are not found in other re-
                                                                ment mechanism. In particular, we exploit tex-
    sources.
                                                                tual resources in different languages1 in order to
1    Introduction                                               acquire and align varied lexical-semantic material
                                                                of the form 
Lexical resources constitute a key instrument for               that are common and shared by all the k languages
many NLP tasks such as Word Sense Disambigua-                   involved. As we demonstrate through a simple
tion and Machine Translation. However, their po-                implementation, our method allows to create new
tential may vary widely depending on the nature                 lexical-semantic relations between words that are
of the lexical-semantic knowledge they encode, as               not always available in other resources, as well as
well as on how the linguistic data are stored and               to perform an automatic word sense disambigua-
linked within the network (Zock and Biemann,                    tion process. This system therefore enhances the
2020). The resources that are presently avail-                  encoding of prototypical semantic information of
able, such as WordNet (Miller, 1995), typically en-             concepts that is also likely to be free from strong
code lexical-semantic knowledge mainly in terms                 cultural-linguistic and lexicographic biases.
of word senses, defined by textual (i.e. dictionary)
                                                                   The benefits provided by our novel multilingual
definitions, and lexical entries are linked and put in
                                                                word alignment mechanism are thus fourfold: (i)
context through lexical-semantic relations. These
                                                                a linguistic and lexicographic de-biasing of lexical
relations, being only of a paradigmatic nature, are
                                                                knowledge; (ii) naturally-disambiguated aligned
characterized by a sharing of the same defining
                                                                lexical entries; (iii) the discovery of novel lexical-
properties between the words and a requirement
                                                                semantic relations; and (iv) the representation of
that the words be of the same syntactic class (Mor-
                                                                prototypical semantic information of concepts in
ris and Hirst, 2004). Typically related words are
                                                                different languages.
     Copyright © 2021 for this paper by its authors. Use per-
                                                                   1
mitted under Creative Commons License Attribution 4.0 In-            In this work, we start with the combination of three lan-
ternational (CC BY 4.0).                                        guages: English, German and Italian.
2       Background and Related Work                          to some (to a certain extent) widely-accepted and
                                                             shared information. CSK describes the kind of
2.1       Bias Types                                         general knowledge material that humans use to
Due to its complex and fluid nature, lexical seman-          define, differentiate and reason about the concep-
tics needs to undergo a process of abstraction and           tualizations they have in mind (Ruggeri et al.,
simplification in order to be encoded into a formal          2019). ConceptNet (Speer et al., 2017) is one
model. As a result, lexical knowledge provided by            of the largest CSK resources, collecting and auto-
lexical resources - especially when monolingual -            matically integrating data starting from the orig-
will inherently carry different types of biases. In          inal MIT Open Mind Common Sense project3 .
particular, i) linguistic and ii) lexicographic biases       However, terms in ConceptNet are not disam-
affect the encoding, consumption, and exploitation           biguated. Property norms (McRae et al., 2005;
of lexical knowledge in downstream tasks.                    Devereux et al., 2014) represent a similar kind of
                                                             resource, which is more focused on the cognitive
Linguistic bias Lexical information encoded in
                                                             and perception-based aspects of word meaning.
a language’s lexicon, as well as the potential con-
                                                             Norms, in contrast with ConceptNet, are based
texts in which a given lexeme can occur, inevitably
                                                             on semantic features empirically-constructed via
reflect the socio-cultural background of the speak-
                                                             questionnaires producing lexical (often ambigu-
ers of that language. Lexical resources used for the
                                                             ous) labels associated with target concepts, with-
compilation of lexical knowledge are often con-
                                                             out any systematic methodology of knowledge
ceived as monolingual, therefore they mostly re-
                                                             collection and encoding.
turn culture-bounded semantic information which
does not account for more shared knowledge.                     Another widespread modeling approach is
                                                             based on vector space models of lexical knowl-
Lexicographic bias The nuclear components                    edge. Vectors are automatically learnt from large
extracted from textual definitions can be different          corpora utilizing a wide range of statistical tech-
depending on the resource used, even within a sin-           niques, all centered on Harris’ distributional as-
gle language (Kiefer, 1988). For example, the def-           sumption (Harris, 1954), i.e. words that occur
inition of “cow” reported by the Oxford Dictio-              in the same contexts tend to have similar mean-
nary is “a large animal kept on farms to produce             ings. Well-known models include word embed-
milk or beef ” while the Merriam-Webster Dictio-             dings (Mikolov et al., 2013; Pennington et al.,
nary reports “the mature female of cattle”. Both             2014; Bojanowski et al., 2016), sense embed-
endogenous and exogenous properties can be sub-              dings (Huang et al., 2012; Iacobacci et al., 2015;
jectively reported (Woods, 1975), such as the term           Kumar et al., 2019), and contextualized embed-
“large” and the milk production respectively.                dings (Scarlini et al., 2020). However, the rela-
                                                             tions holding between vector representations are
2.2       Related Work                                       not typed, nor are they organized systematically.
On one side, lexicons are built on top of synsets2              Among the several other modeling strategies
and contextualize meanings (or senses) mainly in             proposed, lexicographic-centered resources have
terms of paradigmatic relations. WordNet (Miller,            been focused on the contextualization of lexical
1995) and BabelNet (Navigli and Ponzetto, 2010)              items within syntactic structures, e.g. Corpus
can be seen as the cornerstone and the summit in             Pattern Analysis (CPA) (Hanks, 2004), situation
that respect. However, if on the one hand Word-              frames such as FrameNet (Fillmore, 1977; Baker
Net’s dense network of taxonomic relationships               et al., 1998) and conceptual frames (Moerdijk et
allows a high degree of systematization, on the              al., 2008; Leone et al., 2020). Words are not taken
other hand, a key unsolved issue with “wordnets”             in isolation and the meaning they are attributed is
is the fine granularity of their inventories. Note           connected to prototypical patterns or typed slots.
that multilingualism in BabelNet is provided as an           However, these theories and methods for building
indexing service rather than as an alignment and             semantic resources remain linked to the lexical ba-
unbiasing systematization method.                            sis and do not manage the mentioned biases.
   Extensions of these resources also include
Common-Sense Knowledge (CSK), which refers
                                                               3
                                                                 https://www.media.mit.edu/projects/o
    2
        Words considered as synonyms in specific contexts.   pen-mind-common-sense/overview/
3     The Multilingual Word Alignment                                wool           Wolle             lana
                                                                     sheep          Schal            cotone
As is known, a single word form can be associ-
                                                                    cotton         spinnen            Biella
ated with more than one related sense, causing
                                                                   synthetic      Baumwolle         sintetica
what is referred to as semantic ambiguity, or poly-
                                                                      spin          Rudolf           sciarpa
semy. This phenomenon, however, manifests itself
                                                                     scarf        synthetisch        pecora
differently across languages, since each language
                                                                    mitten          Schafe            filare
encodes meaning into words in its own particular
way. We can therefore assume that, while a given           Table 1: Unordered lists of single-language related
polysemous word may be ambiguous in a certain              words for .
context, a semantically corresponding word in an-
other language will possibly not. Based on this
                                                           to the relevant data. We would not expect a lan-
assumption, it is possible to exploit this cross-
                                                           guage spoken in a place without carps to have a
language property to disambiguate a given word
                                                           word corresponding to “carp”. The purpose of this
using its semantic equivalent in another language
                                                           project is not to forcibly identify universally valid
when they both occur in the same context. Such
                                                           semantic relationships, rather to not report biased
disambiguation process can take place because
                                                           information deriving from the use of data coming
the two words feature different semantic - specif-
                                                           from a single linguistic context. For this reason, in
ically, polysemous - behaviours. Accordingly, we
                                                           our case the choice fell on European languages 4
developed a knowledge acquisition methodology
                                                           (two Germanic languages and a Romance one).
that features the power of word sense disambigua-
tion, relying on a multilingual  alignment mechanism.
                                                           We now describe in detail the alignment mecha-
   After providing a brief illustration of the lan-
                                                           nism through a basic example. Consider the fol-
guages we have selected for this first trial, we de-
                                                           lowing word forms: wool (EN); Wolle (DE); lana
scribe more in detail the methodology by using a
                                                           (IT), expressing a single target concept5 .
basic example. Afterwards, a simple implementa-
                                                              For each of the three lexical forms we collect a
tion of the proposed mechanism is presented.
                                                           set of related words in terms of paradigmatic (e.g.
3.1    Languages Involved                                  synonyms) and syntagmatic (e.g. co-occurrences)
                                                           relations. The target-related words can possibly be
Among the benefits provided by the multilingual            modifiers, verbs, or substantives. We thus obtain
word alignment methodology we propose, one is              three different lists of words, one for each of the
that it prevents the represented lexical informa-          languages involved. The retrieved terms in the lists
tion from containing strong cultural-linguistic bi-        are still potentially ambiguous, since they refer to
ases. This objective is pursued through the use of         a lexical form rather than to a contextually defined
three different languages, reflecting in turn three        concept. Table 1 provides a small excerpt of such
diverse backgrounds. For this first trial we in-           unordered lists of related words.
volved English, German and Italian. These lan-                The lexical data in the lists are subsequently
guages were chosen primarily because we are pro-           compared and filtered in order to select only the
ficient in them, therefore we are able to exert con-       semantic items that occur in all the lists, i.e., those
trol over the data of our trial, as well as to interpret   shared by the three languages6 , in the reported ex-
the results properly. Concurrently, given the na-          ample. The resulting words are thus aligned with
ture of the methodology, it was necessary to select        their semantic counterparts, generating a set of
a set of languages with a certain degree of simi-          aligned triplets, as shown in Table 2.
larity in terms of shared lexical-semantic material.          This multilingual word alignment provides, as
Indeed, the alignment mechanism can work and be            a consequence, an automatic Word Sense Disam-
effective as long as the lexical-semantic systems of       biguation system. Once the triplets are formed,
the languages involved reflect a somewhat similar          their members will be indeed associated with a
cultural-linguistic background. For example, we               4
                                                                By “European” we refer to the European linguistic area.
might expect languages to agree on the meanings               5
                                                                An absolute monosemy is, of course, realistically un-
of “carp”, “cottage” and “sled” as long as speak-          reachable.
                                                              6
ers of these languages have comparable exposure                 This implies the presence of a translation step.
    wool              Wolle                lana         being language-specific items within those con-
    sheep      ↔      Schafe       ↔      pecora        texts. Therefore, the lexical information provided
   cotton      ↔    Baumwolle      ↔      cotone        by the alignment mechanism will be free from
  synthetic    ↔    syntetisch     ↔     sintetica      strong cultural-linguistic biases. Finally, as illus-
     spin      ↔     spinnen       ↔       filare       trated in the next section, by exploiting multiple
    scarf      ↔      Schal        ↔      sciarpa       and differently built resources, we are able to re-
                                                        duce arbitrariness and lexicographic biases within
Table 2: Examples of aligned concept-related            the lexical knowledge represented.
words for .
                                                        4     Implementation
                                                        In this section we describe details and results of a
likely unique sense, i.e. the one coming from
                                                        simple implementation of the proposed alignment
the intersection of all possible language-specific
                                                        mechanism for the acquisition of disambiguated
senses related to the three words. In other terms,
                                                        and unbiased lexical information. In particular, the
the target-related words, once aligned, naturally
                                                        system is composed of two main modules: a con-
identify (and provide) a common semantic con-
                                                        text generation and an alignment procedure. We
text. As a consequence, potentially polysemous
                                                        finally report the results of an evaluation to high-
words are disambiguated through such context,
                                                        light mainly (i) the autonomous disambiguation
without any support from sense repositories. For
                                                        power of the approach, (ii) the quality of the align-
example, the context-consistent sense of the verb
                                                        ments and their unbiased and syntagmatic nature,
to spin (EN), which is a highly polysemous word
                                                        and (iii) the amount of unveiled lexical-semantic
in English, can be identified by selecting the only
                                                        relations not covered by existing state-of-the-art
sense that is also shared by the other two aligned
                                                        resources such as BabelNet.
words, i.e. “turn fibres into thread”. In fact,
neither spinnen (DE) nor filare (IT) can possibly           POS      scale      bilancia        Waage
mean e.g. “rotate”.                                         noun   accuracy    precisione    Genauigkeit
   This mechanism generates a twofold effect: be-           noun    balance    equilibrio      Balance
sides performing word sense disambiguation, it              noun      bulk       massa          Masse
also provides lexical knowledge in the form of              noun    control    controllo      Kontrolle
(paradigmatic and syntagmatic) lexical-semantic             noun    device    dispositivo        Gerät
relations between words that is also language-              noun     figure       cifra          Zahl
unbounded. In the first place, the uncontrolled              adj   accurate      preciso        genau
character of the data retrieval and alignment                adj     smart    intelligente    intelligent
process offers the generation of novel lexical-             verb   indicate     indicare        zeigen
semantic relations that are likely not available in         verb       set      regolare      einstellen
other structured resources. Additionally, since the
resulting set of words related to the target can be     Table 3: 10 automatic alignments (out of 74)
only the one shared by multiple languages, the lex-     for the target concept  (BabelNet synset:00069470n).
cultural/linguistic background, rather a common
and shared one. For example, in Table 1 the pres-       4.1    Context for Multilingual Alignment
ence of the word “Biella” among the list of words       To retrieve the concept-related words for the mul-
related to “lana”, probably refers to the fact that     tilingual alignment we made use of two textual
the Italian city Biella is (locally) famous for its     resources: Sketch Engine (Kilgarriff et al., 2014)
wool, therefore the two words may co-occur fre-         and the Leipzig Corpora Collection (Quasthoff et
quently. Similarly, if we consider the alignment        al., 2014). Through the former, we searched for
, a lexeme re-        related words with its tool named “Word Sketch”
lated to the English word form would be “rain”,         on the TenTen Corpus Family7 . In particular, we
due to the well-known idiom “it’s raining cats and      were able to automatically collect words appear-
dogs”. However, neither “Biella” nor correspond-        ing in the following grammatical relations: “mod-
ing words for “rain” can possibly result in the lists     7
                                                            https://www.sketchengine.eu/document
of related words of the respective other languages,     ation/tenten-corpora
               00008050n        00069470n       00069470n      00062766n       00008364n        00008363n
    (en)          libra           scale            plane         plane            bank             bank
     (it)       bilancia         bilancia          aereo         piano           banca             riva
    (de)         Waage            Waage          Flugzeug        Ebene           Bank              Ufer
 triplets           26              74              272           151             349               80
 novel(en)      88,46%           87,84%           88,97%        89,40%          87,68%           91,25%         88,9%
 novel(it)      76,92%           66,22%           75,74%        73,51%          75,64%           68,75%         72,8%
 novel(de)      88,46%           74,32%           87,87%        84,11%          81,66%           76,25%         82,1%

Table 4: Alignments for six ambiguous concepts and percentage of unveiled novel relations in each lan-
guage with respect to the BabelNet database. Some examples of triplets for the concept scale-bilancia-
Waage (bn:00069470n) are shown in Table 3.


ifiers of w”, “adj. predicates of w”, “verbs with w            lexicalizations of the synsets connected to it, to-
as subject” and “verbs with w as object”. The re-              gether with the words included in their glosses9 .
trieved concept-related words are then lemmatized                 As test cases, we randomly picked 500 concepts
and marked with the suitable POS tags. Finally,                constituting polysemous words in at least one of
we utilized the Leipzig Corpora Collection portal              the three languages, obtaining non-empty align-
for searching additional context words in terms of             ments for 456 of them. In Table 4 we report the
left and right (POS-tagged) co-occurrences.                    results of the alignment on six concepts.
                                                                  Despite its limitations, our first implementa-
4.2   Multilingual Alignment                                   tion of the proposed methodology was able to dis-
The Google Translate API was used for find-                    cover a total of 76,152 multilingual alignments
ing translations of related words in the three lan-            over the 456 concepts, with (on average) more
guages8 . In particular, given a certain term tL1 in a         than 80% novel semantic relations with respect
language L1, we opted for retrieving all its possi-            to what is currently encoded in BabelNet across
ble translations into the other two languages (L2,             the three languages. Still, the extracted data rep-
L3). We then tried to match each translated item               resent mostly unbiased and disambiguated knowl-
with the previously-retrieved sets of related words            edge, leading towards the construction of a new
in L2, L3. Whenever the [tL1 ↔ tL2 ]; [tL1 ↔ tL3 ]             large-scale and multilingual prototypical lexical
match succeeded, we finally checked any possible               database.
[tL2 ↔ tL3 ] match. If a [tL1 ↔ tL2 ↔ tL3 ] se-
mantic equivalence occurs, then the alignment can              5   Conclusions and Future Work
take place. Table 3 shows an excerpt of automatic              In this paper we proposed an original methodol-
alignments for the concept scale (bn:00069470n).               ogy for acquiring and encoding lexical knowledge
                                                               through a novel yet simple mechanism of multi-
4.3   Evaluation
                                                               lingual alignment. The aim was to represent var-
Our aim is not to overcome state-of-the-art re-                ied, disambiguated, and language-unbounded lexi-
sources but rather to incorporate new and unbi-                cal knowledge by minimizing strong linguistic and
ased semantic relations from a novel multilingual              lexicographic biases. A simple implementation
alignment mechanism. In particular, we wanted                  and experimentation on 456 concepts carried to
to verify to what extent our knowledge acquisition             unveil around 76K aligned lexical-semantic fea-
method is able to unveil lexical relations yet un-             tures, of which more than 80% resulted new when
covered by a state-of-the-art resource (BabelNet).             compared with a current state-of-the-art resource
   Thus, we first generated sets of related words              such as BabelNet. Future directions include the
from BabelNet in order to compare them with                    use of more languages and large-scale runs over
those produced and aligned by our (automatized)                thousands of main concepts (Bentivogli et al.,
methodology. In particular, through the BabelNet               2004; Di Caro and Ruggeri, 2019; Camacho-
API, we obtained the English, Italian, and German              Collados and Navigli, 2017).
   8                                                              9
     No surrounding syntactic context for the words to align        We used the SpaCy library to analyze, extract and lem-
was available for more advanced Machine Translation.           matize the text - https://spacy.io.
References                                                Sawan Kumar, Sharmistha Jat, Karan Saxena, and
                                                            Partha Talukdar. 2019. Zero-shot word sense dis-
Collin F Baker, Charles J Fillmore, and John B Lowe.        ambiguation using sense definition embeddings. In
  1998. The berkeley framenet project. In 36th An-          Proceedings of the 57th Annual Meeting of the Asso-
  nual Meeting of the Association for Computational         ciation for Computational Linguistics, pages 5670–
  Linguistics and 17th International Conference on          5681.
  Computational Linguistics, Volume 1, pages 86–90.
Luisa Bentivogli, Pamela Forner, Bernardo Magnini,        Valentina Leone, Giovanni Siragusa, Luigi Di Caro,
  and Emanuele Pianta. 2004. Revising the wordnet           and Roberto Navigli. 2020. Building semantic
  domains hierarchy: semantics, coverage and balanc-        grams of human knowledge. In Proceedings of the
  ing. In Proceedings of the workshop on multilingual       12th Language Resources and Evaluation Confer-
  linguistic resources, pages 94–101.                       ence, pages 2991–3000.

Piotr Bojanowski, Edouard Grave, Armand Joulin,           Ken McRae, George S Cree, Mark S Seidenberg, and
   and Tomas Mikolov. 2016. Enriching word vec-             Chris McNorgan. 2005. Semantic feature produc-
   tors with subword information. arXiv preprint            tion norms for a large set of living and nonliving
   arXiv:1607.04606.                                        things. Behav. r. m., 37(4):547–559.

Jose Camacho-Collados and Roberto Navigli. 2017.          Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor-
   Babeldomains: Large-scale domain labeling of lex-        rado, and Jeff Dean. 2013. Distributed representa-
   ical resources. In Proceedings of the 15th Confer-       tions of words and phrases and their compositional-
   ence of the European Chapter of the Association for      ity. In Advances in neural information processing
   Computational Linguistics: Volume 2, Short Papers,       systems, pages 3111–3119.
   pages 223–228.
                                                          George A Miller.       1995.  Wordnet: a lexical
Barry J Devereux, Lorraine K Tyler, Jeroen Geertzen,        database for english. Communications of the ACM,
  and Billi Randall. 2014. The cslb concept property        38(11):39–41.
  norms. Behavior research methods, 46(4):1119–
  1127.                                                   Fons Moerdijk, Carole Tiberius, and Jan Niestadt.
                                                            2008. Accessing the anw dictionary. In Proc. of
Luigi Di Caro and Alice Ruggeri. 2019. Unveiling
                                                            the workshop on Cognitive Aspects of the Lexicon,
  middle-level concepts through frequency trajecto-
                                                            pages 18–24.
  ries and peaks analysis. In Proceedings of the 34th
  ACM/SIGAPP Symposium on Applied Computing,
                                                          Jane Morris and Graeme Hirst. 2004. Non-classical
  pages 1035–1042.
                                                             lexical semantic relations.  In Proceedings of
Charles J Fillmore. 1977. Scenes-and-frames seman-           the Computational Lexical Semantics Workshop at
  tics. Linguistic structures processing, 59:55–88.          HLT-NAACL 2004, pages 46–51, Boston, Mas-
                                                             sachusetts, USA, May 2 - May 7. Association for
Patrick Hanks. 2004. Corpus pattern analysis. In Eu-         Computational Linguistics.
  ralex Proceedings, volume 1, pages 87–98.
                                                          Roberto Navigli and Simone Paolo Ponzetto. 2010.
Zellig S Harris. 1954. Distributional structure. Word,      BabelNet: Building a very large multilingual seman-
  10(2-3):146–162.                                          tic network. In Proc. of ACL, pages 216–225. Asso-
                                                            ciation for Computational Linguistics.
Eric H Huang, Richard Socher, Christopher D Man-
   ning, and Andrew Y Ng. 2012. Improving word            Martha Palmer, Hoa Trang Dang, and Christiane Fell-
   representations via global context and multiple word    baum. 2007. Making fine-grained and coarse-
   prototypes. In Proc. of ACL, pages 873–882.             grained sense distinctions, both manually and auto-
Ignacio Iacobacci, Mohammad Taher Pilehvar, and            matically. Nat.Lan.Eng., 13(02):137–163.
   Roberto Navigli. 2015. SensEmbed: learning sense
   embeddings for word and relational similarity. In      Jeffrey Pennington, Richard Socher, and Christopher D
   Proceedings of ACL, pages 95–105.                         Manning. 2014. Glove: Global vectors for word
                                                             representation. In EMNLP, volume 14, pages 1532–
Ferenc Kiefer. 1988. Linguistic, conceptual and ency-        43.
  clopedic knowledge: Some implications for lexicog-
  raphy. In T. Magay and J. Zigány, editors, Proceed-    Uwe Quasthoff, Dirk Goldhahn, and Thomas Eckart.
  ings of the 3rd EURALEX International Congress,           2014. Building large resources for text mining: The
  pages 1–10, Budapest, Hungary, sep. Akadémiai            leipzig corpora collection. In Text Mining, pages 3–
  Kiadó.                                                   24. Springer.

Adam Kilgarriff, Vı́t Baisa, Jan Bušta, Miloš           Alice Ruggeri, Luigi Di Caro, and Guido Boella. 2019.
  Jakubı́ček, Vojtěch Kovář, Jan Michelfeit, Pavel      The role of common-sense knowledge in assessing
  Rychlý, and Vı́t Suchomel. 2014. The sketch en-          semantic association. Journal on Data Semantics,
  gine: Ten years on. The Lexicography, 1(1):7–36.          8(1):39–56.
Bianca Scarlini, Tommaso Pasini, and Roberto Navigli.
  2020. SensEmBERT: Context-Enhanced Sense Em-
  beddings for Multilingual Word Sense Disambigua-
  tion. In Proceedings of the 34th Conference on Arti-
  ficial Intelligence. Association for the Advancement
  of Artificial Intelligence.
Robert Speer, Joshua Chin, and Catherine Havasi.
  2017. Conceptnet 5.5: An open multilingual graph
  of general knowledge. In Thirty-First AAAI Confer-
  ence on Artificial Intelligence.
William A Woods. 1975. What’s in a link: Founda-
  tions for semantic networks. In Representation and
  understanding, pages 35–82. Elsevier.
Michael Zock and Chris Biemann. 2020. Comparison
  of different lexical resources with respect to the tip-
  of-the-tongue problem. Journal of Cognitive Sci-
  ence, 21(2):193–252.