<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Reducing polysemy in WordNet</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kanjana Jiamjitvanich</string-name>
          <email>kanjana@disi.unitn.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mikalai Yatskevich</string-name>
          <email>yatskevi@disi.unitn.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information and Communication Technology, University of Trento</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>WordNet [4] is the lexical database for English language. A synset is a WordNet structure for storing senses of the words. Synset contains a set of synonym words and their brief description called gloss. For example, well, wellspring and fountainhead have the same meaning according to WordNet, so these three words are grouped in to one synset which is explained by a gloss "an abundant source". A known problem of WordNet is that it is too fine-grained in its sense definitions. For instance, it does not distinguish between homographs (words that have the same spelling and different meanings) and polysemes (words that have related meanings). We propose to distinguish only between polysemes within WordNet while merging all homograph synsets. The ultimate goal is to compute a more coarse-grained version of linguistic database.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 WordNet</title>
      <p>
        Meta matcher is designed as a WordNet matcher, i.e., a matcher that is effective in
matching WordNet with itself. It utilizes extensible set of element level matchers (see
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for extensive discussion) and combines their results in hybrid manner, i.e., the
final score is computed from the scores of independently executed matchers.
      </p>
      <p>We implemented three element level matchers.</p>
      <p>WordNet relation matcher (WNR). WNR takes two senses as an input and obtains
two sets of senses connected to input senses by a given relation. Then these two sets
are compared exploiting well-known Dice coefficient formula.</p>
      <p>Part of speech context (POSC). POSC matcher exploits part of speech (POS) and
sense tagged corpora for similarity computation. In particular, for each WordNet
sense occurrence within corpora a set POS tags in the immediate vicinity of sense is
memorized. Given multiple occurrence of a sense within corpora each sense is
associated with a set of POS contexts. Then, the similarity between two senses is computed
as set similarity between sets of POS contexts associated with them.</p>
      <p>
        Inverted sense index inexact (ISII). ISII matcher exploits sense tagged WordNet
3.0 glosses for similarity computation. In particular, for each WordNet sense
occurrence within sense tagged glosses, the synset of a tagged gloss is memorized. Than,
senses are compared by comparing sets of synsets associated with them. We compare
synsets exploiting well known Resnik similarity measure [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>Matching process is organized in two steps.</p>
      <sec id="sec-1-1">
        <title>2.1 Element level matchers threshold learning</title>
        <p>The necessary prerequisite for this step is a training dataset or (a part of) the matching
task for which human alignment H is known. All element level matchers then are
executed on the training dataset, i.e., we obtain complete set of correspondences M for all
matchers. Then the threshold learning procedure is executed. It performs exhaustive
search through all threshold combinations for all element level matchers. Thus, we
can select threshold that maximizes a given matching quality metric, e.g., Recall.</p>
        <p>In the case of several matchers system result set S is obtained from their results
through a combination strategy, namely a function that takes matchers results in input
and produces a binary decision of whether the given correspondence holds. In this
paper we used union of all matchers results as a combination strategy, i.e., if a given
correspondence is returned by at least one matcher it is included in S.</p>
      </sec>
      <sec id="sec-1-2">
        <title>2.2 Hybrid matching</title>
        <p>On this step meta matcher is executed on testing dataset. Element level matchers
results are combined using thresholds and the combination strategy exploited in the
previous step. For union combination strategy positive result is produced only if
confidence score, as computed by element level matchers, is higher than threshold
learned on the previous step.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3 Evaluation results</title>
      <p>We used a dataset exploited in SemEval1 evaluation. The dataset contains 1108 nouns,
591 verbs, 262 adjectives and 208 adverbs. We split it into two equal parts: training
and testing datasets.</p>
      <p>
        We compared results of meta matcher with 3 other sense merging methods. In
particular, we re-implemented sense merging algorithm [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], Genclust algorithm [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and
MiMo algorithm [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Meta matcher outperforms the other methods in terms of
FMeasure.
1 http://lcl.di.uniroma1.it/coarse-grained-aw/index.html
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>F.</given-names>
            <surname>Giunchiglia</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Yatskevich</surname>
          </string-name>
          .
          <article-title>Element level semantic matching</article-title>
          .
          <source>In The Semantic Web: ISWC</source>
          <year>2004</year>
          : Third International Semantic Web Conference: Proceedings,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>W.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hemayati</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <article-title>Semantic-based grouping of search engine results using wordnet</article-title>
          .
          <source>In 9th Asia-Pacific Web Conference (AP-Web/WAIM'07)</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>R.</given-names>
            <surname>Mihalcea</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Moldovan</surname>
          </string-name>
          .
          <article-title>Automatic generation of a coarse-grained wordnet</article-title>
          .
          <source>In NAACL Workshop on WordNet</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>G.</given-names>
            <surname>Miller. WordNet</surname>
          </string-name>
          :
          <article-title>An electronic Lexical Database</article-title>
          . MIT Press,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>W.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Peters</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Vossen</surname>
          </string-name>
          .
          <article-title>Automatic sense clustering in eurowordnet</article-title>
          .
          <source>In Proceedings of LREC'</source>
          <year>1998</year>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>P.</given-names>
            <surname>Resnik</surname>
          </string-name>
          .
          <article-title>Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language</article-title>
          .
          <source>Journal of Artificial Intelligence Research</source>
          ,
          <volume>11</volume>
          :
          <fpage>95</fpage>
          {
          <fpage>130</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>