<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Methodology for Large-Scale, Disambiguated and Unbiased Lexical Knowledge Acquisition Based on Multilingual Word Alignment</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Francesca Grasso</string-name>
          <email>fr.grasso@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luigi Di Caro</string-name>
          <email>luigi.dicaro@unito.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Turin, Department of Computer Science</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In order to be concretely effective, many NLP applications require the availability of lexical resources providing varied, broadly shared, and language-unbounded lexical information. However, state-ofthe-art knowledge models rarely adopt such a comprehensive and cross-lingual approach to semantics. In this paper, we propose a novel automatable methodology for knowledge modeling based on a multilingual word alignment mechanism that enhances the encoding of unbiased and naturally disambiguated lexical knowledge. Results from a simple implementation of the proposal show relevant outcomes that are not found in other resources.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Lexical resources constitute a key instrument for
many NLP tasks such as Word Sense
Disambiguation and Machine Translation. However, their
potential may vary widely depending on the nature
of the lexical-semantic knowledge they encode, as
well as on how the linguistic data are stored and
linked within the network
        <xref ref-type="bibr" rid="ref26 ref29">(Zock and Biemann,
2020)</xref>
        . The resources that are presently
available, such as WordNet
        <xref ref-type="bibr" rid="ref18">(Miller, 1995)</xref>
        , typically
encode lexical-semantic knowledge mainly in terms
of word senses, defined by textual (i.e. dictionary)
definitions, and lexical entries are linked and put in
context through lexical-semantic relations. These
relations, being only of a paradigmatic nature, are
characterized by a sharing of the same defining
properties between the words and a requirement
that the words be of the same syntactic class
        <xref ref-type="bibr" rid="ref2 ref20">(Morris and Hirst, 2004)</xref>
        . Typically related words are
      </p>
      <p>
        Copyright © 2021 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).
therefore not represented due to the absence of
syntagmatic links. Additionally, word senses
suffer from a lack of explicit common-sense
knowledge and context-dependent information. Finally,
the well-known fine granularity of word senses in
WordNet
        <xref ref-type="bibr" rid="ref22">(Palmer et al., 2007)</xref>
        is due to the lack
of a meaning encoding system capable of
representing concepts in a flexible way. Other kinds of
resources such as FrameNet
        <xref ref-type="bibr" rid="ref1">(Baker et al., 1998)</xref>
        and ConceptNet
        <xref ref-type="bibr" rid="ref27">(Speer et al., 2017)</xref>
        present the
same issue, while returning different types and
degrees of structural semantic information and
disambiguation capabilities.
      </p>
      <p>In this contribution, we provide a novel
methodology for the retrieval and representation of
unbiased and naturally disambiguated lexical
information that relies on a multilingual word
alignment mechanism. In particular, we exploit
textual resources in different languages1 in order to
acquire and align varied lexical-semantic material
of the form &lt;target-concept, {related words}k&gt;
that are common and shared by all the k languages
involved. As we demonstrate through a simple
implementation, our method allows to create new
lexical-semantic relations between words that are
not always available in other resources, as well as
to perform an automatic word sense
disambiguation process. This system therefore enhances the
encoding of prototypical semantic information of
concepts that is also likely to be free from strong
cultural-linguistic and lexicographic biases.</p>
      <p>The benefits provided by our novel multilingual
word alignment mechanism are thus fourfold: (i)
a linguistic and lexicographic de-biasing of lexical
knowledge; (ii) naturally-disambiguated aligned
lexical entries; (iii) the discovery of novel
lexicalsemantic relations; and (iv) the representation of
prototypical semantic information of concepts in
different languages.</p>
      <p>1In this work, we start with the combination of three
languages: English, German and Italian.
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Background and Related Work</title>
      <sec id="sec-2-1">
        <title>Bias Types</title>
        <p>
          Due to its complex and fluid nature, lexical
semantics needs to undergo a process of abstraction and
simplification in order to be encoded into a formal
model. As a result, lexical knowledge provided by
lexical resources - especially when monolingual
will inherently carry different types of biases. In
particular, i) linguistic and ii) lexicographic biases
affect the encoding, consumption, and exploitation
of lexical knowledge in downstream tasks.
Linguistic bias Lexical information encoded in
a language’s lexicon, as well as the potential
contexts in which a given lexeme can occur, inevitably
reflect the socio-cultural background of the
speakers of that language. Lexical resources used for the
compilation of lexical knowledge are often
conceived as monolingual, therefore they mostly
return culture-bounded semantic information which
does not account for more shared knowledge.
Lexicographic bias The nuclear components
extracted from textual definitions can be different
depending on the resource used, even within a
single language
          <xref ref-type="bibr" rid="ref12">(Kiefer, 1988)</xref>
          . For example, the
definition of “cow” reported by the Oxford
Dictionary is “a large animal kept on farms to produce
milk or beef ” while the Merriam-Webster
Dictionary reports “the mature female of cattle”. Both
endogenous and exogenous properties can be
subjectively reported
          <xref ref-type="bibr" rid="ref28">(Woods, 1975)</xref>
          , such as the term
“large” and the milk production respectively.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Related Work</title>
        <p>
          On one side, lexicons are built on top of synsets2
and contextualize meanings (or senses) mainly in
terms of paradigmatic relations. WordNet
          <xref ref-type="bibr" rid="ref18">(Miller,
1995)</xref>
          and BabelNet
          <xref ref-type="bibr" rid="ref21">(Navigli and Ponzetto, 2010)</xref>
          can be seen as the cornerstone and the summit in
that respect. However, if on the one hand
WordNet’s dense network of taxonomic relationships
allows a high degree of systematization, on the
other hand, a key unsolved issue with “wordnets”
is the fine granularity of their inventories. Note
that multilingualism in BabelNet is provided as an
indexing service rather than as an alignment and
unbiasing systematization method.
        </p>
        <p>
          Extensions of these resources also include
Common-Sense Knowledge (CSK), which refers
2Words considered as synonyms in specific contexts.
to some (to a certain extent) widely-accepted and
shared information. CSK describes the kind of
general knowledge material that humans use to
define, differentiate and reason about the
conceptualizations they have in mind
          <xref ref-type="bibr" rid="ref25 ref6">(Ruggeri et al.,
2019)</xref>
          . ConceptNet
          <xref ref-type="bibr" rid="ref27">(Speer et al., 2017)</xref>
          is one
of the largest CSK resources, collecting and
automatically integrating data starting from the
original MIT Open Mind Common Sense project3.
However, terms in ConceptNet are not
disambiguated. Property norms
          <xref ref-type="bibr" rid="ref16 ref5">(McRae et al., 2005;
Devereux et al., 2014)</xref>
          represent a similar kind of
resource, which is more focused on the cognitive
and perception-based aspects of word meaning.
Norms, in contrast with ConceptNet, are based
on semantic features empirically-constructed via
questionnaires producing lexical (often
ambiguous) labels associated with target concepts,
without any systematic methodology of knowledge
collection and encoding.
        </p>
        <p>
          Another widespread modeling approach is
based on vector space models of lexical
knowledge. Vectors are automatically learnt from large
corpora utilizing a wide range of statistical
techniques, all centered on Harris’ distributional
assumption
          <xref ref-type="bibr" rid="ref9">(Harris, 1954)</xref>
          , i.e. words that occur
in the same contexts tend to have similar
meanings. Well-known models include word
embeddings
          <xref ref-type="bibr" rid="ref17 ref23 ref3">(Mikolov et al., 2013; Pennington et al.,
2014; Bojanowski et al., 2016)</xref>
          , sense
embeddings
          <xref ref-type="bibr" rid="ref10 ref11 ref14">(Huang et al., 2012; Iacobacci et al., 2015;
Kumar et al., 2019)</xref>
          , and contextualized
embeddings
          <xref ref-type="bibr" rid="ref26">(Scarlini et al., 2020)</xref>
          . However, the
relations holding between vector representations are
not typed, nor are they organized systematically.
        </p>
        <p>
          Among the several other modeling strategies
proposed, lexicographic-centered resources have
been focused on the contextualization of lexical
items within syntactic structures, e.g. Corpus
Pattern Analysis (CPA)
          <xref ref-type="bibr" rid="ref8">(Hanks, 2004)</xref>
          , situation
frames such as FrameNet
          <xref ref-type="bibr" rid="ref1 ref7">(Fillmore, 1977; Baker
et al., 1998)</xref>
          and conceptual frames
          <xref ref-type="bibr" rid="ref15 ref19">(Moerdijk et
al., 2008; Leone et al., 2020)</xref>
          . Words are not taken
in isolation and the meaning they are attributed is
connected to prototypical patterns or typed slots.
However, these theories and methods for building
semantic resources remain linked to the lexical
basis and do not manage the mentioned biases.
        </p>
        <p>3https://www.media.mit.edu/projects/o
pen-mind-common-sense/overview/</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>The Multilingual Word Alignment</title>
      <p>As is known, a single word form can be
associated with more than one related sense, causing
what is referred to as semantic ambiguity, or
polysemy. This phenomenon, however, manifests itself
differently across languages, since each language
encodes meaning into words in its own particular
way. We can therefore assume that, while a given
polysemous word may be ambiguous in a certain
context, a semantically corresponding word in
another language will possibly not. Based on this
assumption, it is possible to exploit this
crosslanguage property to disambiguate a given word
using its semantic equivalent in another language
when they both occur in the same context. Such
disambiguation process can take place because
the two words feature different semantic -
specifically, polysemous - behaviours. Accordingly, we
developed a knowledge acquisition methodology
that features the power of word sense
disambiguation, relying on a multilingual &lt;target-concept,
{related words}k&gt; alignment mechanism.</p>
      <p>After providing a brief illustration of the
languages we have selected for this first trial, we
describe more in detail the methodology by using a
basic example. Afterwards, a simple
implementation of the proposed mechanism is presented.
3.1</p>
      <sec id="sec-3-1">
        <title>Languages Involved</title>
        <p>Among the benefits provided by the multilingual
word alignment methodology we propose, one is
that it prevents the represented lexical
information from containing strong cultural-linguistic
biases. This objective is pursued through the use of
three different languages, reflecting in turn three
diverse backgrounds. For this first trial we
involved English, German and Italian. These
languages were chosen primarily because we are
proifcient in them, therefore we are able to exert
control over the data of our trial, as well as to interpret
the results properly. Concurrently, given the
nature of the methodology, it was necessary to select
a set of languages with a certain degree of
similarity in terms of shared lexical-semantic material.
Indeed, the alignment mechanism can work and be
effective as long as the lexical-semantic systems of
the languages involved reflect a somewhat similar
cultural-linguistic background. For example, we
might expect languages to agree on the meanings
of “carp”, “cottage” and “sled” as long as
speakers of these languages have comparable exposure
wool
sheep
cotton
synthetic
spin
scarf
mitten</p>
      </sec>
      <sec id="sec-3-2">
        <title>Wolle</title>
        <p>Schal
spinnen
Baumwolle</p>
        <p>Rudolf
synthetisch</p>
        <p>Schafe
lana
cotone</p>
        <p>Biella
sintetica
sciarpa
pecora
iflare
to the relevant data. We would not expect a
language spoken in a place without carps to have a
word corresponding to “carp”. The purpose of this
project is not to forcibly identify universally valid
semantic relationships, rather to not report biased
information deriving from the use of data coming
from a single linguistic context. For this reason, in
our case the choice fell on European languages 4
(two Germanic languages and a Romance one).
We now describe in detail the alignment
mechanism through a basic example. Consider the
following word forms: wool (EN); Wolle (DE); lana
(IT), expressing a single target concept5.</p>
        <p>For each of the three lexical forms we collect a
set of related words in terms of paradigmatic (e.g.
synonyms) and syntagmatic (e.g. co-occurrences)
relations. The target-related words can possibly be
modifiers, verbs, or substantives. We thus obtain
three different lists of words, one for each of the
languages involved. The retrieved terms in the lists
are still potentially ambiguous, since they refer to
a lexical form rather than to a contextually defined
concept. Table 1 provides a small excerpt of such
unordered lists of related words.</p>
        <p>The lexical data in the lists are subsequently
compared and filtered in order to select only the
semantic items that occur in all the lists, i.e., those
shared by the three languages6, in the reported
example. The resulting words are thus aligned with
their semantic counterparts, generating a set of
aligned triplets, as shown in Table 2.</p>
        <p>This multilingual word alignment provides, as
a consequence, an automatic Word Sense
Disambiguation system. Once the triplets are formed,
their members will be indeed associated with a
4By “European” we refer to the European linguistic area.
5An absolute monosemy is, of course, realistically
unreachable.</p>
        <p>6This implies the presence of a translation step.
wool
sheep
cotton
synthetic
spin
scarf
↔
↔
↔
↔
↔</p>
      </sec>
      <sec id="sec-3-3">
        <title>Wolle</title>
        <p>Schafe
Baumwolle
syntetisch
spinnen
Schal
↔
↔
↔
↔
↔</p>
        <p>lana
pecora
cotone
sintetica</p>
        <p>iflare
sciarpa
likely unique sense, i.e. the one coming from
the intersection of all possible language-specific
senses related to the three words. In other terms,
the target-related words, once aligned, naturally
identify (and provide) a common semantic
context. As a consequence, potentially polysemous
words are disambiguated through such context,
without any support from sense repositories. For
example, the context-consistent sense of the verb
to spin (EN), which is a highly polysemous word
in English, can be identified by selecting the only
sense that is also shared by the other two aligned
words, i.e. “turn fibres into thread ”. In fact,
neither spinnen (DE) nor filare (IT) can possibly
mean e.g. “rotate”.</p>
        <p>This mechanism generates a twofold effect:
besides performing word sense disambiguation, it
also provides lexical knowledge in the form of
(paradigmatic and syntagmatic) lexical-semantic
relations between words that is also
languageunbounded. In the first place, the uncontrolled
character of the data retrieval and alignment
process offers the generation of novel
lexicalsemantic relations that are likely not available in
other structured resources. Additionally, since the
resulting set of words related to the target can be
only the one shared by multiple languages, the
lexical knowledge it encodes does not reflect a single
cultural/linguistic background, rather a common
and shared one. For example, in Table 1 the
presence of the word “Biella” among the list of words
related to “lana”, probably refers to the fact that
the Italian city Biella is (locally) famous for its
wool, therefore the two words may co-occur
frequently. Similarly, if we consider the alignment
&lt;cat (EN), Katze (DE), gatto (IT)&gt;, a lexeme
related to the English word form would be “rain”,
due to the well-known idiom “it’s raining cats and
dogs”. However, neither “Biella” nor
corresponding words for “rain” can possibly result in the lists
of related words of the respective other languages,
being language-specific items within those
contexts. Therefore, the lexical information provided
by the alignment mechanism will be free from
strong cultural-linguistic biases. Finally, as
illustrated in the next section, by exploiting multiple
and differently built resources, we are able to
reduce arbitrariness and lexicographic biases within
the lexical knowledge represented.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4 Implementation</title>
      <p>In this section we describe details and results of a
simple implementation of the proposed alignment
mechanism for the acquisition of disambiguated
and unbiased lexical information. In particular, the
system is composed of two main modules: a
context generation and an alignment procedure. We
ifnally report the results of an evaluation to
highlight mainly (i) the autonomous disambiguation
power of the approach, (ii) the quality of the
alignments and their unbiased and syntagmatic nature,
and (iii) the amount of unveiled lexical-semantic
relations not covered by existing state-of-the-art
resources such as BabelNet.</p>
      <p>POS
noun
noun
noun
noun
noun
noun
adj
adj
verb
verb</p>
      <p>scale
accuracy
balance</p>
      <p>bulk
control
device
ifgure
accurate</p>
      <p>smart
indicate
set
bilancia
precisione
equilibrio</p>
      <p>massa
controllo
dispositivo
cifra
preciso
intelligente
indicare
regolare</p>
      <sec id="sec-4-1">
        <title>Waage</title>
        <p>Genauigkeit</p>
        <p>Balance</p>
        <p>Masse
Kontrolle</p>
        <p>Gera¨t
Zahl
genau
intelligent</p>
        <p>
          zeigen
einstellen
To retrieve the concept-related words for the
multilingual alignment we made use of two textual
resources: Sketch Engine
          <xref ref-type="bibr" rid="ref13">(Kilgarriff et al., 2014)</xref>
          and the Leipzig Corpora Collection
          <xref ref-type="bibr" rid="ref24">(Quasthoff et
al., 2014)</xref>
          . Through the former, we searched for
related words with its tool named “Word Sketch”
on the TenTen Corpus Family7. In particular, we
were able to automatically collect words
appearing in the following grammatical relations:
“mod7https://www.sketchengine.eu/document
ation/tenten-corpora
(en)
(it)
(de)
triplets
novel(en)
novel(it)
novel(de)
ifiers of w ”, “adj. predicates of w”, “verbs with w
as subject” and “verbs with w as object”. The
retrieved concept-related words are then lemmatized
and marked with the suitable POS tags. Finally,
we utilized the Leipzig Corpora Collection portal
for searching additional context words in terms of
left and right (POS-tagged) co-occurrences.
The Google Translate API was used for
finding translations of related words in the three
languages8. In particular, given a certain term tL1 in a
language L1, we opted for retrieving all its
possible translations into the other two languages (L2,
L3). We then tried to match each translated item
with the previously-retrieved sets of related words
in L2, L3. Whenever the [tL1 ↔ tL2]; [tL1 ↔ tL3]
match succeeded, we finally checked any possible
[tL2 ↔ tL3] match. If a [tL1 ↔ tL2 ↔ tL3]
semantic equivalence occurs, then the alignment can
take place. Table 3 shows an excerpt of automatic
alignments for the concept scale (bn:00069470n).
4.3
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Evaluation</title>
        <p>Our aim is not to overcome state-of-the-art
resources but rather to incorporate new and
unbiased semantic relations from a novel multilingual
alignment mechanism. In particular, we wanted
to verify to what extent our knowledge acquisition
method is able to unveil lexical relations yet
uncovered by a state-of-the-art resource (BabelNet).</p>
        <p>Thus, we first generated sets of related words
from BabelNet in order to compare them with
those produced and aligned by our (automatized)
methodology. In particular, through the BabelNet
API, we obtained the English, Italian, and German
lexicalizations of the synsets connected to it,
together with the words included in their glosses9.</p>
        <p>As test cases, we randomly picked 500 concepts
constituting polysemous words in at least one of
the three languages, obtaining non-empty
alignments for 456 of them. In Table 4 we report the
results of the alignment on six concepts.</p>
        <p>Despite its limitations, our first
implementation of the proposed methodology was able to
discover a total of 76,152 multilingual alignments
over the 456 concepts, with (on average) more
than 80% novel semantic relations with respect
to what is currently encoded in BabelNet across
the three languages. Still, the extracted data
represent mostly unbiased and disambiguated
knowledge, leading towards the construction of a new
large-scale and multilingual prototypical lexical
database.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Work</title>
      <p>
        In this paper we proposed an original
methodology for acquiring and encoding lexical knowledge
through a novel yet simple mechanism of
multilingual alignment. The aim was to represent
varied, disambiguated, and language-unbounded
lexical knowledge by minimizing strong linguistic and
lexicographic biases. A simple implementation
and experimentation on 456 concepts carried to
unveil around 76K aligned lexical-semantic
features, of which more than 80% resulted new when
compared with a current state-of-the-art resource
such as BabelNet. Future directions include the
use of more languages and large-scale runs over
thousands of main concepts
        <xref ref-type="bibr" rid="ref14 ref2 ref25 ref27 ref4 ref6">(Bentivogli et al.,
2004; Di Caro and Ruggeri, 2019;
CamachoCollados and Navigli, 2017)</xref>
        .
      </p>
      <p>8No surrounding syntactic context for the words to align
was available for more advanced Machine Translation.
9We used the SpaCy library to analyze, extract and
lemmatize the text - https://spacy.io.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Collin F Baker</surname>
          </string-name>
          ,
          <string-name>
            <surname>Charles J Fillmore</surname>
          </string-name>
          , and John B Lowe.
          <year>1998</year>
          .
          <article-title>The berkeley framenet project</article-title>
          .
          <source>In 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics</source>
          , Volume
          <volume>1</volume>
          , pages
          <fpage>86</fpage>
          -
          <lpage>90</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Luisa</given-names>
            <surname>Bentivogli</surname>
          </string-name>
          , Pamela Forner, Bernardo Magnini, and
          <string-name>
            <given-names>Emanuele</given-names>
            <surname>Pianta</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Revising the wordnet domains hierarchy: semantics, coverage and balancing</article-title>
          .
          <source>In Proceedings of the workshop on multilingual linguistic resources</source>
          , pages
          <fpage>94</fpage>
          -
          <lpage>101</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Piotr</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          , Edouard Grave, Armand Joulin, and
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Enriching word vectors with subword information</article-title>
          .
          <source>arXiv preprint arXiv:1607</source>
          .
          <fpage>04606</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Jose</surname>
            Camacho-Collados and
            <given-names>Roberto</given-names>
          </string-name>
          <string-name>
            <surname>Navigli</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Babeldomains: Large-scale domain labeling of lexical resources</article-title>
          .
          <source>In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume</source>
          <volume>2</volume>
          ,
          <string-name>
            <surname>Short</surname>
            <given-names>Papers</given-names>
          </string-name>
          , pages
          <fpage>223</fpage>
          -
          <lpage>228</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Barry J Devereux</surname>
          </string-name>
          , Lorraine K Tyler,
          <string-name>
            <surname>Jeroen Geertzen</surname>
            , and
            <given-names>Billi</given-names>
          </string-name>
          <string-name>
            <surname>Randall</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>The cslb concept property norms</article-title>
          .
          <source>Behavior research methods</source>
          ,
          <volume>46</volume>
          (
          <issue>4</issue>
          ):
          <fpage>1119</fpage>
          -
          <lpage>1127</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Luigi</given-names>
            <surname>Di</surname>
          </string-name>
          Caro and
          <string-name>
            <given-names>Alice</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Unveiling middle-level concepts through frequency trajectories and peaks analysis</article-title>
          .
          <source>In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing</source>
          , pages
          <fpage>1035</fpage>
          -
          <lpage>1042</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Charles J Fillmore</surname>
          </string-name>
          .
          <year>1977</year>
          .
          <article-title>Scenes-and-frames semantics</article-title>
          .
          <source>Linguistic structures processing</source>
          ,
          <volume>59</volume>
          :
          <fpage>55</fpage>
          -
          <lpage>88</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Hanks</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Corpus pattern analysis</article-title>
          .
          <source>In Euralex Proceedings</source>
          , volume
          <volume>1</volume>
          , pages
          <fpage>87</fpage>
          -
          <lpage>98</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Zellig S Harris</surname>
          </string-name>
          .
          <year>1954</year>
          .
          <article-title>Distributional structure</article-title>
          .
          <source>Word</source>
          ,
          <volume>10</volume>
          (
          <issue>2-3</issue>
          ):
          <fpage>146</fpage>
          -
          <lpage>162</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Eric H Huang</surname>
            , Richard Socher,
            <given-names>Christopher D</given-names>
          </string-name>
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , and Andrew Y Ng.
          <year>2012</year>
          .
          <article-title>Improving word representations via global context and multiple word prototypes</article-title>
          .
          <source>In Proc. of ACL</source>
          , pages
          <fpage>873</fpage>
          -
          <lpage>882</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Ignacio</given-names>
            <surname>Iacobacci</surname>
          </string-name>
          , Mohammad Taher Pilehvar, and
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Navigli</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>SensEmbed: learning sense embeddings for word and relational similarity</article-title>
          .
          <source>In Proceedings of ACL</source>
          , pages
          <fpage>95</fpage>
          -
          <lpage>105</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Ferenc</given-names>
            <surname>Kiefer</surname>
          </string-name>
          .
          <year>1988</year>
          .
          <article-title>Linguistic, conceptual and encyclopedic knowledge: Some implications for lexicography</article-title>
          . In T. Magay and J. Ziga´ny, editors,
          <source>Proceedings of the 3rd EURALEX International Congress</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          , Budapest, Hungary, sep. Akade´miai Kiado´.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Adam</given-names>
            <surname>Kilgarriff</surname>
          </string-name>
          , V´ıt Baisa, Jan Busˇta, Milosˇ Jakub´ıcˇek, Vojteˇch Kova´rˇ,
          <string-name>
            <surname>Jan</surname>
            <given-names>Michelfeit</given-names>
          </string-name>
          , Pavel Rychly´, and V´ıt Suchomel.
          <year>2014</year>
          .
          <article-title>The sketch engine: Ten years on</article-title>
          .
          <source>The Lexicography</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ):
          <fpage>7</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Sawan</given-names>
            <surname>Kumar</surname>
          </string-name>
          , Sharmistha Jat,
          <string-name>
            <given-names>Karan</given-names>
            <surname>Saxena</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Partha</given-names>
            <surname>Talukdar</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Zero-shot word sense disambiguation using sense definition embeddings</article-title>
          .
          <source>In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</source>
          , pages
          <fpage>5670</fpage>
          -
          <lpage>5681</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Valentina</given-names>
            <surname>Leone</surname>
          </string-name>
          , Giovanni Siragusa, Luigi Di Caro, and
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Navigli</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Building semantic grams of human knowledge</article-title>
          .
          <source>In Proceedings of the 12th Language Resources and Evaluation Conference</source>
          , pages
          <fpage>2991</fpage>
          -
          <lpage>3000</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Ken</surname>
            <given-names>McRae</given-names>
          </string-name>
          ,
          <article-title>George S Cree, Mark S Seidenberg,</article-title>
          and
          <string-name>
            <surname>Chris McNorgan</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Semantic feature production norms for a large set of living and nonliving things</article-title>
          .
          <source>Behav</source>
          . r. m.,
          <volume>37</volume>
          (
          <issue>4</issue>
          ):
          <fpage>547</fpage>
          -
          <lpage>559</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , Ilya Sutskever, Kai Chen, Greg S Corrado, and
          <string-name>
            <given-names>Jeff</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>George A</given-names>
            <surname>Miller</surname>
          </string-name>
          .
          <year>1995</year>
          .
          <article-title>Wordnet: a lexical database for english</article-title>
          .
          <source>Communications of the ACM</source>
          ,
          <volume>38</volume>
          (
          <issue>11</issue>
          ):
          <fpage>39</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Fons</given-names>
            <surname>Moerdijk</surname>
          </string-name>
          , Carole Tiberius, and
          <string-name>
            <given-names>Jan</given-names>
            <surname>Niestadt</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Accessing the anw dictionary</article-title>
          .
          <source>In Proc. of the workshop on Cognitive Aspects of the Lexicon</source>
          , pages
          <fpage>18</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Jane</given-names>
            <surname>Morris</surname>
          </string-name>
          and
          <string-name>
            <given-names>Graeme</given-names>
            <surname>Hirst</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Non-classical lexical semantic relations</article-title>
          .
          <source>In Proceedings of the Computational Lexical Semantics Workshop at HLT-NAACL</source>
          <year>2004</year>
          , pages
          <fpage>46</fpage>
          -
          <lpage>51</lpage>
          , Boston, Massachusetts, USA, May 2 - May 7. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Navigli</surname>
          </string-name>
          and Simone Paolo Ponzetto.
          <year>2010</year>
          .
          <article-title>BabelNet: Building a very large multilingual semantic network</article-title>
          .
          <source>In Proc. of ACL</source>
          , pages
          <fpage>216</fpage>
          -
          <lpage>225</lpage>
          . Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Martha</given-names>
            <surname>Palmer</surname>
          </string-name>
          , Hoa Trang Dang, and
          <string-name>
            <given-names>Christiane</given-names>
            <surname>Fellbaum</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Making fine-grained and coarsegrained sense distinctions, both manually and automatically</article-title>
          .
          <source>Nat.Lan</source>
          .Eng.,
          <volume>13</volume>
          (
          <issue>02</issue>
          ):
          <fpage>137</fpage>
          -
          <lpage>163</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Jeffrey</surname>
            <given-names>Pennington</given-names>
          </string-name>
          , Richard Socher, and
          <string-name>
            <given-names>Christopher D</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Glove: Global vectors for word representation</article-title>
          .
          <source>In EMNLP</source>
          , volume
          <volume>14</volume>
          , pages
          <fpage>1532</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>Uwe</given-names>
            <surname>Quasthoff</surname>
          </string-name>
          , Dirk Goldhahn, and
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Eckart</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Building large resources for text mining: The leipzig corpora collection</article-title>
          .
          <source>In Text Mining</source>
          , pages
          <fpage>3</fpage>
          -
          <lpage>24</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>Alice</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          , Luigi Di Caro, and
          <string-name>
            <given-names>Guido</given-names>
            <surname>Boella</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>The role of common-sense knowledge in assessing semantic association</article-title>
          .
          <source>Journal on Data Semantics</source>
          ,
          <volume>8</volume>
          (
          <issue>1</issue>
          ):
          <fpage>39</fpage>
          -
          <lpage>56</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <given-names>Bianca</given-names>
            <surname>Scarlini</surname>
          </string-name>
          , Tommaso Pasini, and
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Navigli</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>SensEmBERT: Context-Enhanced Sense Embeddings for Multilingual Word Sense Disambiguation</article-title>
          .
          <source>In Proceedings of the 34th Conference on Artiifcial Intelligence . Association for the Advancement of Artificial Intelligence.</source>
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <given-names>Robert</given-names>
            <surname>Speer</surname>
          </string-name>
          , Joshua Chin, and
          <string-name>
            <given-names>Catherine</given-names>
            <surname>Havasi</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Conceptnet 5.5: An open multilingual graph of general knowledge</article-title>
          .
          <source>In Thirty-First AAAI Conference on Artificial Intelligence .</source>
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <given-names>William A</given-names>
            <surname>Woods</surname>
          </string-name>
          .
          <year>1975</year>
          .
          <article-title>What's in a link: Foundations for semantic networks</article-title>
          .
          <source>In Representation and understanding</source>
          , pages
          <fpage>35</fpage>
          -
          <lpage>82</lpage>
          . Elsevier.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <given-names>Michael</given-names>
            <surname>Zock</surname>
          </string-name>
          and
          <string-name>
            <given-names>Chris</given-names>
            <surname>Biemann</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Comparison of different lexical resources with respect to the tipof-the-tongue problem</article-title>
          .
          <source>Journal of Cognitive Science</source>
          ,
          <volume>21</volume>
          (
          <issue>2</issue>
          ):
          <fpage>193</fpage>
          -
          <lpage>252</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>