<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Discrimination of Word Senses with Hypernyms</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Artem Revenko</string-name>
          <email>artem.revenko@semantic-web.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Victor Mireles</string-name>
          <email>victor.mireles-chavez@semantic-web.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Semantic Web Company</institution>
          ,
          <addr-line>Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Languages are inherently ambiguous. Four out of ve words in English have more than one meaning. Nowadays there is a growing number of small proprietary thesauri used for knowledge management for di erent applications. In order to enable the usage of these thesauri for automatic text annotations, we introduce a robust method for discriminating word senses using hypernyms. The method uses collocations to induce word senses and to discriminate the thesaural sense from the other senses by utilizing hypernym entries taken from a thesaurus. The main novelty of this work is the usage of hypernyms already at the stage sense induction. The hypernyms enable us to cast the task to a binary scenario, namely teasing apart thesaural senses from all the rest. The introduced method outperforms the baseline and has indicates accuracy above 80%.</p>
      </abstract>
      <kwd-group>
        <kwd>thesaurus</kwd>
        <kwd>controlled vocabulary</kwd>
        <kwd>word sense induction</kwd>
        <kwd>entity linking</kwd>
        <kwd>named entity disambiguation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Information retrieval can successfully provide services like search or
recommendation only once di erent senses in the corpus are distinguished. Studies show
that in the everyday usage of English about 80% of words are ambiguous [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
Even when controlled vocabularies are available, it is often the case that a
label representing a concept has \non-technical" senses, and these senses are also
present in the given corpus at hand. Thus, the task of word sense disambiguation
(WSD) is still called for in the presence of controlled vocabularies.
Controlled vocabulary refer here to a nite, well speci ed set of concepts that
typify a speci c domain. Each concept has an associated set of labels. A label
can be a word in a broad sense, i.e. it may be word or words habitually used in a
(natural) language, an acronym or even an arbitrary sequence of symbols taken
from a chosen alphabet. A controlled vocabulary can thus be used to capture
synonymy, by grouping di erent synonyms as labels of the same concept. Usually
we represent a concept by one of its labels that is chosen in advance and is called
the preferred label. Since controlled vocabularies express no semantic relationship
between concepts their use in disambiguating senses is limited.
      </p>
      <p>In order to enrich controlled vocabularies with semantic information, often the
mutually inverse relations of hyponym vs. hypernym are considered. The
hypernym of a concept x is de ned as a concept whose meaning includes the meaning
of x, so that any concrete entity that is an instance of concept x is also an
instance of its hypernym. We call a thesaurus a tuple consisting of a vocabulary
and list of hypernym/hyponym relations between its concepts.</p>
      <p>
        In particular, thesauri encoding these relations via the skos:narrower and
skos:broader predicates following the standardized SKOS vocabulary [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] are
commonly found in the industry [15, p. 169]1, in part because they can be built with
little e ort for domain-speci c use-cases. This is in contrast to more general
knowledge graphs which can aid in disambiguating senses (often at high
computational cost [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]), but notably they are also costly to compile, and thus not
widely adopted.
      </p>
      <p>In order to leverage the knowledge encoded in a thesaurus for tackling WSD, we
begin by noting the following:
{ Ambiguity is a characteristic of words: concepts are non-ambiguous. Thus, it
is enough to map a word onto a concept in order to disambiguate its sense.
Many concepts, however, including those not present in a thesaurus, may
have the same label.</p>
      <p>
        Example 1. The STW thesaurus [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] includes a concept with the label Bond
(http://zbw.eu/stw/descriptor/12234-1),
which is hyponym of a concept with the label Securities and is a
hypernym of a concept with the label Asset-backed securities. Yet, in the phrase
We invite our top investors to bond while playing golf, the word bond does
not refer to the thesaural concept.
{ In the case of domain-speci c analysis of texts, it is su cient to determine
whether a given word is being used in the sense encoded in the thesaurus,
referred to as the (thesaural sense), or not. Casting this task to a binary task
also simpli es it. Consequently, we are not interested to nd all the possible
senses and disambiguate them. Rather, we aim at disambiguating correctly
only one chosen (thesaural) sense.
1.1
      </p>
      <sec id="sec-1-1">
        <title>Problem Statement</title>
        <p>We present in this work a method that tackles the domain-speci c case of WSD.
It is noteworthy that this method does not require all possible senses of a word
to be contained in the thesaurus. This makes it specially useful in the industrial
environment, where usually only small thesauri are available.</p>
        <p>Speci cally, we take as an input a corpus, a thesaurus, and a concept from the
thesaurus, one of whose labels is found throughout the corpus. We call this
concept the target entity. The problem that we deal with is to distinguish, for
each document in the corpus, whether the target entity is used in the thesaural
sense or not. Thus, the end result is a partition of the corpus into two disjoint
collections: \this" and \other". The collection \this" contains the documents
that feature the target entity in the thesaural sense.
1 Though strictly speaking the semantics of broader/narrower relation can be more
general than hypernym/hyponym, for example, the meronym relation can also be
encoded as broader/narrower.
The contributions of this work are the following:
{ We introduce a method for Word Sense Induction (WSI) with the usage of
hypernyms.
{ We introduce a pipelined work- ow to discriminate between thesaural and
non-thesaural senses of the target entity, by utilizing its hypernyms.
{ We prepare and carry out an experiment that resembles a real world use
case: concepts have multiple labels and the corpus is ambiguous.
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        The problem of word sense disambiguation has attracted much attention for
several decades now. We refer the reader to [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] for a review. Here we would
like to mention that one could roughly distinguish three classes of approaches to
solve WSD: supervised, unsupervised and knowledge-based. The unsupervised
methods are usually characterized by
{ No need for external knowledge, therefore, the system is broadly applicable.
      </p>
      <p>However, the results depend a lot on the corpus.</p>
      <p>{ No expected user involvement.</p>
      <p>Our method makes use of the knowledge in thesaurus, in this sense it is
knowledgebased. However, it shares many features of the unsupervised methods in that it
does not expect any information about the non-thesaural senses (even their
number) and that no user involvement is expected.</p>
      <p>Among the large amount of research that has been done on this problem, there
is some work that is tightly connected to the approach presented in this paper,
we describe it in the subsequent subsections.
2.1</p>
      <sec id="sec-2-1">
        <title>Knowledge-Based Methods a.k.a. Named Entity Linking</title>
        <p>
          The knowledge-based methods [
          <xref ref-type="bibr" rid="ref17 ref18">18, 17</xref>
          ] are based on large static knowledge
graphs (KG) that are computed a priori, usually with great e ort. Such a graph
can be, for example, WordNet [
          <xref ref-type="bibr" rid="ref12 ref23">12, 23</xref>
          ]. This graphs are assumed to include all
senses of the target word, either explicitly as labels of the nodes or not. A corpus
is then analyzed and, depending on it, the KG is traversed (e.g. with a random
walk [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]) and the possible senses are thus discovered. It is important to point
out that using a KG exclusively, without aid of the corpus, is known to lead to
poor results, specially in domain-speci c cases in which the thesaural sense is
unlikely to be part of the knowledge base.
        </p>
        <p>In the problem statement we focus on a single target entity. One can naturally
extend the method and run it for all the identi ed named entities to link it to
the correct sense in the KG. The method would bene t from the hypernyms
contained in the KG. However, our method makes use of the corpus, therefore it
may not be appropriate for disambiguating very short strings, like search queries.</p>
      </sec>
      <sec id="sec-2-2">
        <title>Word Sense Induction</title>
        <p>The task of WSI is a preliminary step for tackling WSD This task consists of
nding (inducing) the senses present in unannotated data. In the terms described
above, the input to this task is a corpus and a target word. The outcome is an
enumeration of all the senses found for the target word.</p>
        <p>
          Of the many ways the WSI problem has been approached, it is important to
mention two in the context of this work. The rst is the so-called sense embedding
[
          <xref ref-type="bibr" rid="ref13 ref9">13, 9</xref>
          ]. These methods follow the distributional assumption according to which
similar words appear in similar contexts [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], and they apply skip-gram models to
reduce the dimensionality of related words and assign them to a given word as its
`semantic signature'. These context embeddings are then clustered to determine
the di erent senses of each word. We should mention that, in our experience,
these methods fail in cases where two senses of a word lead to very similar
contexts, such as \Americano" which can refer to either a cocktail or a co ee.
The second common approach to the WSI task which is of interest here, is to
analyze the text and extract from it collocation graphs: graphs whose nodes are
words found in the text close to the target word, and whose edges are weighted
to re ect the strength of this collocation. This has the advantage that previously
unknown senses can be induced and described with the help of collocations. To
weight the edges of the graph, conditional probabilities [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], Dice scores [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] or
word co-occurrences relative to a reference corpus [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] have been used. More than
that, discrete features derived from the syntactic use of each word (e.g. [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]) may
be used. In the current paper we abstain from relying on any language speci c
tools such as part of speech taggers or sentence parsers in order to stay language
independent. We make use of stemmers, however, they only help to improve
results and are not essential for the introduced methods. Moreover, stemmers
are one of the simplest language speci c tools and are widely available for many
languages.
        </p>
        <p>
          The next step in the collocation-based graph algorithms is to identify a collection
of sets of nodes, each of which correspond to a sense. Once done, WSD can be
carried out by clustering the contexts where the target word appears. There
have been several approaches to identifying these senses. Graph clustering via
various algorithms has been a popular approach (e.g. in [
          <xref ref-type="bibr" rid="ref16 ref5 ref7">7, 5, 16</xref>
          ]). Another
approach, built speci cally for inducing senses, is HyperLex [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], which de nes
senses as hubs (highly connected nodes) in the collocation graph along with
their immediate neighbors. To identify senses, HyperLex sorts the nodes of the
collocation graph by degrees. Senses are induced by taking and removing one by
one the hubs from list, along with their immediate neighbors. In this paper we
use a variant of HyperLex introduced in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] with PageRank scores in place of
degrees.
        </p>
        <p>
          Other works, such as [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], use hypernyms after the senses are already induced.
This work, in contrast, takes hypernyms into account already at the stage of
graph clustering. That way we guarantee that
{ the thesaural meaning of the target word is captured in a single sense and
{ the level of granularity of the sense is de ned by the structure of the
thesaurus.
        </p>
        <p>We note that in the case of collocation graphs, there is no explicit description
of what a particular sense is. This can be contrasted with the discussion in the
previous section, in which senses are de ned as concepts in a thesaurus with
known hypernym/hyponym relations; or with the static KG based approach to
LSI, in which senses are pre-existing entities in the graph. This lack of
interpretability of a sense is overcome in this work by relating at least one of the
collocation-derived senses with a concept in the thesaurus.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Method</title>
      <p>
        In this section we introduce a method to solve the task introduced in Section 1.1.
We rely on the \one sense per document" assumption [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] and on the assumption
that the sense of the entity can be unambiguously deduced from its hypernyms
or more generally, a thesaurus.
      </p>
      <p>Our method consists of 2 steps:
1. WSI; the outcome is a set of senses with one distinguished thesaural sense.
2. WSD, i.e. classi cation of each occurrence of the target word into one of the
senses.
3.1</p>
      <sec id="sec-3-1">
        <title>WSI with Hypernyms</title>
        <p>
          As already mentioned in the introduction, for the WSI task we use HyperLex
[
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], implementing it in a similar way to [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. However, we also introduce
hypernyms in the WSI process, therefore we will denote the new modi cation as
HyperHyperLex. The process is presented in Figure 1.
        </p>
        <p>In the rst step we compute the graph of collocations between all the tokens
in the text. In this implementation we use only unigrams, however, the usage
of n-grams would clearly improve the performance. In Figure 1 this phase is
represented as a graph of collocations; \E" stands for target entity.
In the second step we compute the PageRank of the nodes in the graph. The
nodes with the highest PageRank have a thicker border in the second sub gure
in Figure 1.</p>
        <p>In the third step we introduce the hypernyms into the process and for each node
we compute the following measure
m(n) := P R(n)</p>
        <p>CO(E; n) +</p>
        <p>
          Ph2H CO(h; n)
jHj
;
where P R stands for PageRank, H is the set of hypernyms, CO is the collocation
measures. We use a variant of Dice Scores [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] as a word-association measure to
grab collocations. We sort the nodes with respect to m(n).
        </p>
        <p>In the fourth step we take the rst node in the sorted list of nodes and identify
it as a hub. Next we build a cluster around the hub. For this purpose we take
each node and assess it involvement in the cluster by summing the Dice Scores
between itself and the hub, the direct neighbors of the hub and the hypernyms.
This sum is then normalized by the sum of the Dice Scores between the node
and all of its neighbors. Therefore, involvement of the node is a number between
0 and 1. In contrast to the original HyperLex method, in HyperHyperLex the
nodes belong to the clusters to some degree.</p>
        <p>After the rst cluster is induced we return to the sorted list and take the next
node. If the node is not yet involved in other clusters (i.e. the involvement does
not surpass a certain threshold) we take it as the next hub. When building the
cluster around this hub, we also take only nodes whose involvement in other
clusters does not surpass the threshold. When this process selects as hub a node
that is not in the thesaurus, the only di erence in the above procedure is that
its hypernyms are not taken into account when de ning the involvement of the
nodes in its cluster.</p>
        <p>After this process, each node has a score denoting its membership in a cluster.
This score is used in the next step for classifying the occurrences of the target
entity. The score is de ned by</p>
        <p>s(n) := P R(n) I(n);
where I stands for involvement of the node in said cluster.
Example 2 (Americano). After calculating the collocation graph and sorting
according to PageRank we get the following sorting for potential hubs
1. \campari",
2. \mix",
3. \espresso".</p>
        <p>However, after taking hypernyms into account the highest m score is obtained by
\mix". Therefore, the hypernym-induced hub is \mix", \campari" participates
in this sense as it is one of the collocations of the hub. Therefore, after resorting
\espresso" gets highest score and becomes the rst non-hypernym-induced hub
and the second overall.</p>
        <p>In the preliminary tests we found that several senses corresponding to the target
sense could be induced. However, none of the senses captured the thesaural
sense completely, resulting in misclassi cation. With the help of the hypernyms
it has become possible to capture the thesaural sense in a single sense and
take into account the decisions of the data architect. In Figure 2 the sense
induced with hypernyms is denoted as a dashed circle vs two solid circles induced
without hypernyms. The hypernym-induced sense contains more words than the
thesaural sense. Yet, the intersection of the thesaural and hypernym-induced
senses is larger than that of the thesaural and non-hypernym-induced senses.
This decreases the classi cation error.
{ the keys are the words that belong to the sense cluster,
{ the values are the scores s(n) or the weights of the words with respect to the
sense.</p>
        <p>In order to disambiguate an occurrence of the target word we rst extract its
context. A context is a set of words surrounding the target word, for example,
10 words before and 10 words after the target. Then compute the sum of the
words' scores in the context with respect to each sense. For each sense we take
the corresponding dictionary and sum up all the scores of all the words that are
found in the context and in the sense cluster. Finally, we choose the sense with
the highest aggregate score.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiment</title>
      <p>4.1</p>
      <sec id="sec-4-1">
        <title>Data</title>
        <p>
          We conduct two separate experiments with very similar setups: cocktails and
MeSH [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
        <p>All the data (corpora and thesauri) can be found at github.com/artreven/
thesaural_wsi. The two examples correspond quite well to those found in
industrial settings in terms of size and depth of thesaurus, quantity, size, and
quality of texts.</p>
        <p>Thesauri In the cocktails example the concepts are taken from the \All about
cocktails"2 thesaurus. The thesaurus contains various cocktail instances and
ingredients. We have only used ambiguous cocktail names for this experiment. We
use the concept scheme name \cocktail" and the broaders of the target concept
as hypernyms.</p>
        <p>In the MeSH example we use the MeSH3 thesaurus. We use the preferred labels
of the broader concepts as hypernyms.</p>
        <p>As we only use unigrams in our code, we split the compound labels into unigrams
and use only nouns in both experiments.</p>
        <p>
          Corpora We used the Wikilinks dataset [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] to extract the corpora. The dataset
contains documents and the links to the Wikipedia pages inside the documents.
The texts contain mistakes, which makes them particularly suitable for
simulating a real world use case. The corpora contain duplicates. The data is used as
is, without any cleaning.
        </p>
        <p>First we identify those cocktail names that are ambiguous and then we collected
the texts that mention those names. We remove all the texts that refer to the
2 vocabulary.semantic-web.at/cocktails
3 www.nlm.nih.gov/mesh/
disambiguation page at Wikipedia and all the categories containing less than 5
documents. We do the same procedure for nding ambiguous labels from MeSH.
The preprocessing phase includes removing stopwords, stemming, removing words
that appear less than 3 times, removing the words that appear in more than half
of the documents, and substituting all digits to a special token.</p>
        <p>We collect 13 corpora with a total of 1227 texts for cocktails, see Table 1 for
more details. We collect 8 corpora with a total of 784 texts for MeSH labels, see
Table 3 for more details.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Results</title>
        <p>In the experiment we classify the word occurrences into two categories \this"
and \other". We should note that many considered words have more than 2
meanings, all the non-thesaural meanings fall into the category \other". For the
baseline everything is classi ed into the most popular category. This baseline
is known to be challenging. In practice there is no guarantee that the thesaural
sense would be the most popular, therefore this baseline is better than the results
one could expect in practice without WSID.</p>
        <p>The results for cocktails are presented in Table 1. Observations:
{ As can be seen from results even the high number of the real senses does not
prevent the method from showing high accuracy.
{ With large corpora the accuracy is high due to better WSI.
{ Even if the number of thesaural mentions is very low (\Cosmopolitan",
\B52") the accuracy remains high. We assume that the well represented \other"
senses can be induced accurately and the use of hypernyms improves the
induction of the thesaural sense.
{ The worst results are obtained for the corpora where the thesaural sense
(cocktail) is the dominant: Vesper, Margarita, Martini. Since the other sense(s)
are underrepresented there is a risk of getting several senses capturing the
individual context. In such senses many general-purpose (not sense-speci c)
collocations may be present, as a result these senses may score higher in
general contexts.</p>
        <p>The results for MeSH are presented in Table 1. Observations:
{ For \Warts" only one category is induced. This result is due to
underrepresented \other" sense (5 documents). This is another risk for underrepresented
senses.
{ The MeSH corpora contains many duplicates and dirty texts. For example,
sometimes the target words could be found as part of link strings or as the
name of a button. Cleaning would de nitely improve the results. However,
even in the case of dirty data the method still yields acceptable results.
The averaged results are presented in Table 2 and in Table 4. In both cases
HyperHyperLex outperforms the challenging baseline. The di erence is even
larger when the individual texts are taken into account (micro average), because
the method bene ts from having a larger corpus.
We have introduced a method to automatically discriminate between thesaural
and non-thesaural usage of entities. The method does not require any senses of
the target entity to be provided in advance and does not make use of external
resources except for thesaurus, which is considered an input parameter.
In the two experiments the new method has outperformed the baseline and has
shown an accuracy of about 0.9 and 0.75, respectively. The method is robust
and performs well even in the real-world case of dirty data.
{ We performed WSI using HyperLex for comparison. The results for some
words are comparable or even better, however for other words the accuracy
drops signi cantly. In most cases this was due to the fact shown in Figure 2,
namely, there were several senses that would contain a mixture of thesaural
and other sense.
{ The method would bene t from using bi-grams. Indeed, all the texts contain
many signi cant compound named entities.
{ The method would bene t from using additional NLP features such as part
of speech or dependencies. This would be especially useful when working
with small corpora.</p>
        <p>Acknowledgements We would like to thank Noam Ordan for thoughtful
comments, careful proofreading, and valuable suggestions.</p>
        <p>The work is partially supported by the LDL4HELTA (ldl4.com) project
(EUREKA funding program, project ID 9898).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Eneko</given-names>
            <surname>Agirre</surname>
          </string-name>
          , Oier Lopez de Lacalle, and
          <string-name>
            <given-names>Aitor</given-names>
            <surname>Soroa</surname>
          </string-name>
          .
          <article-title>Random walks for knowledge-based word sense disambiguation</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>40</volume>
          (
          <issue>1</issue>
          ):
          <volume>57</volume>
          {
          <fpage>84</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Eneko</given-names>
            <surname>Agirre</surname>
          </string-name>
          and
          <string-name>
            <given-names>Aitor</given-names>
            <surname>Soroa</surname>
          </string-name>
          .
          <article-title>Personalizing pagerank for word sense disambiguation</article-title>
          .
          <source>In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, EACL '09</source>
          , pages
          <fpage>33</fpage>
          {
          <fpage>41</fpage>
          ,
          <string-name>
            <surname>Stroudsburg</surname>
          </string-name>
          , PA, USA,
          <year>2009</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Sean</given-names>
            <surname>Bechhofer</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alistair</given-names>
            <surname>Miles</surname>
          </string-name>
          .
          <article-title>Skos simple knowledge organization system reference</article-title>
          .
          <source>W3C recommendation, W3C</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Timo</given-names>
            <surname>Borst</surname>
          </string-name>
          and
          <string-name>
            <given-names>Joachim</given-names>
            <surname>Neubert</surname>
          </string-name>
          .
          <article-title>Case study: Publishing stw thesaurus for economics as linked open data</article-title>
          .
          <source>W3C Semantic Web Use Cases and Case Studies</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Antonio Di Marco and
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Navigli</surname>
          </string-name>
          .
          <article-title>Clustering and diversifying web search results with graph-based word sense induction</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>39</volume>
          (
          <issue>3</issue>
          ):
          <volume>709</volume>
          {
          <fpage>754</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Lee</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Dice</surname>
          </string-name>
          .
          <article-title>Measures of the amount of ecologic association between species</article-title>
          .
          <source>Ecology</source>
          ,
          <volume>26</volume>
          (
          <issue>3</issue>
          ):
          <volume>297</volume>
          {
          <fpage>302</fpage>
          ,
          <year>1945</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Beate</given-names>
            <surname>Dorow</surname>
          </string-name>
          and
          <string-name>
            <given-names>Dominic</given-names>
            <surname>Widdows</surname>
          </string-name>
          .
          <article-title>Discovering corpus-speci c word senses</article-title>
          .
          <source>In Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics-Volume</source>
          <volume>2</volume>
          , pages
          <fpage>79</fpage>
          {
          <fpage>82</fpage>
          . Association for Computational Linguistics,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ioannis P Klapaftis and Suresh Manandhar</surname>
          </string-name>
          .
          <article-title>Word sense induction using graphs of collocations</article-title>
          .
          <source>In ECAI</source>
          , pages
          <volume>298</volume>
          {
          <fpage>302</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Jiwei</given-names>
            <surname>Li</surname>
          </string-name>
          and
          <string-name>
            <given-names>Dan</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          .
          <article-title>Do multi-sense embeddings improve natural language understanding</article-title>
          ?
          <source>arXiv preprint arXiv:1506.01070</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Tomas</surname>
            <given-names>Mikolov</given-names>
          </string-name>
          , Ilya Sutskever, Kai Chen, Greg S Corrado, and
          <string-name>
            <given-names>Je</given-names>
            <surname>Dean</surname>
          </string-name>
          .
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          , pages
          <volume>3111</volume>
          {
          <fpage>3119</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Roberto</given-names>
            <surname>Navigli</surname>
          </string-name>
          .
          <article-title>A quick tour of word sense disambiguation, induction and related approaches</article-title>
          .
          <source>SOFSEM</source>
          <year>2012</year>
          :
          <article-title>Theory and practice of computer science</article-title>
          , pages
          <volume>115</volume>
          {
          <fpage>129</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Roberto</surname>
            <given-names>Navigli</given-names>
          </string-name>
          , Stefano Faralli, Aitor Soroa, Oier de Lacalle, and
          <string-name>
            <given-names>Eneko</given-names>
            <surname>Agirre</surname>
          </string-name>
          .
          <article-title>Two birds with one stone: learning semantic models for text categorization and word sense disambiguation</article-title>
          .
          <source>In Proceedings of the 20th ACM international conference on Information and knowledge management</source>
          , pages
          <volume>2317</volume>
          {
          <fpage>2320</fpage>
          . ACM,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Arvind</surname>
            <given-names>Neelakantan</given-names>
          </string-name>
          , Jeevan Shankar, Alexandre Passos, and
          <string-name>
            <surname>Andrew McCallum</surname>
          </string-name>
          .
          <article-title>E cient non-parametric estimation of multiple embeddings per word in vector space</article-title>
          .
          <source>arXiv preprint arXiv:1504.06654</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Nelson</surname>
          </string-name>
          .
          <article-title>Medical terminologies that work: The example of mesh</article-title>
          .
          <source>In 2009 10th International Symposium on Pervasive Systems</source>
          , Algorithms, and Networks, pages
          <volume>380</volume>
          {
          <fpage>384</fpage>
          ,
          <string-name>
            <surname>Dec</surname>
          </string-name>
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Julia</surname>
            <given-names>Neuschmid</given-names>
          </string-name>
          , Sabrina Kirrane, Elmar Kiesling, and Thomas Thurner.
          <article-title>Propelling the Potential of Enterprise Linked Data in Austria</article-title>
          . monochrom,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Alexander</surname>
            <given-names>Panchenko</given-names>
          </string-name>
          , Eugen Ruppert, Stefano Faralli, Simone Paolo Ponzetto, and
          <string-name>
            <given-names>Chris</given-names>
            <surname>Biemann</surname>
          </string-name>
          .
          <article-title>Unsupervised does not mean uninterpretable: the case for word sense induction and disambiguation</article-title>
          .
          <source>Association for Computational Linguistics</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Delip</surname>
            <given-names>Rao</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paul McNamee</surname>
            , and
            <given-names>Mark</given-names>
          </string-name>
          <string-name>
            <surname>Dredze</surname>
          </string-name>
          .
          <article-title>Entity linking: Finding extracted entities in a knowledge base. In Multi-source, multilingual information extraction and summarization</article-title>
          , pages
          <volume>93</volume>
          {
          <fpage>115</fpage>
          . Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Lev</surname>
            <given-names>Ratinov</given-names>
          </string-name>
          , Dan Roth, Doug Downey, and
          <string-name>
            <given-names>Mike</given-names>
            <surname>Anderson</surname>
          </string-name>
          .
          <article-title>Local and global algorithms for disambiguation to wikipedia</article-title>
          .
          <source>In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language TechnologiesVolume 1</source>
          , pages
          <fpage>1375</fpage>
          {
          <fpage>1384</fpage>
          . Association for Computational Linguistics,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Jennifer</surname>
            <given-names>Rodd</given-names>
          </string-name>
          , Gareth Gaskell, and
          <string-name>
            <surname>William</surname>
          </string-name>
          Marslen-Wilson.
          <article-title>Making sense of semantic ambiguity: Semantic competition in lexical access</article-title>
          .
          <source>Journal of Memory and Language</source>
          ,
          <volume>46</volume>
          (
          <issue>2</issue>
          ):
          <volume>245</volume>
          {
          <fpage>266</fpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Sameer</surname>
            <given-names>Singh</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Amarnag Subramanya</given-names>
            , Fernando Pereira, and
            <surname>Andrew McCallum</surname>
          </string-name>
          .
          <article-title>Wikilinks: A large-scale cross-document coreference corpus labeled via links to Wikipedia</article-title>
          .
          <source>Technical Report UM-CS-2012-015</source>
          , University of Massachusetts, Amherst,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>Jean</given-names>
            <surname>Veronis</surname>
          </string-name>
          .
          <article-title>Hyperlex: lexical cartography for information retrieval</article-title>
          .
          <source>Computer Speech &amp; Language</source>
          ,
          <volume>18</volume>
          (
          <issue>3</issue>
          ):
          <volume>223</volume>
          {
          <fpage>252</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>David</given-names>
            <surname>Yarowsky</surname>
          </string-name>
          .
          <article-title>Unsupervised word sense disambiguation rivaling supervised methods</article-title>
          .
          <source>In Proceedings of the 33rd annual meeting on Association for Computational Linguistics</source>
          , pages
          <volume>189</volume>
          {
          <fpage>196</fpage>
          . Association for Computational Linguistics,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <article-title>Zhi Zhong and Hwee Tou Ng</article-title>
          .
          <article-title>It makes sense: A wide-coverage word sense disambiguation system for free text</article-title>
          .
          <source>In Proceedings of the ACL 2010 System Demonstrations</source>
          , pages
          <volume>78</volume>
          {
          <fpage>83</fpage>
          . Association for Computational Linguistics,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>