<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Challenges for Wordnets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Francis Bond</string-name>
          <email>bond@ieee.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Language, Data and Knowledge Conference Galway, Ireland</institution>
          ,
          <addr-line>18</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Maciej Piasecki Wrocław University of Science and Technology</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Nanyang Technological University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Wordnets are increasingly widely used to model word meaning in natural language processing tasks. However, there are still many challenges in accurately describing word meanings and making these descriptions useful for both human and machine users. This workshop aims to identify, discuss and start to solve existing challenges. The idea grew from the CLARIN Workshop - Towards Interoperability of Lexico-Semantic Resources1 held at Tartu (2017-01). Wordnets are one of the most common semantic lexical resources and exist now for a large number of languages. Many are constructed manually or under careful manual control. Thus their quality is much higher than the quality of resources extracted automatically. Wordnets can provide a core for Linked Open Data that anchors them in language resources. Diferent wordnets are generally similar but still need some efort to combine in a common interoperable multilingual framework. It can be a first step and valuable case study in developing interoperable solutions for semantic lexical resources. In addition to the Princeton WordNet of English, there are several large wordnets built for diferent natural languages, e.g. BulNet (Bulgarian), Czech Wordnet, DanNet, Dutch Open Source WordNet, enWordNet (English, a significant extension to Princeton WordNet), Estonian WordNet, Finish WordNet, GermaNet, MultiWordNet, plWordNet (Polish), sloWNet, as well as several semantic resources similar to or linked to wordnets. Many of them are very large and provide extensive coverage of vocabulary. All of them are somehow mapped to diferent versions of Princeton WordNet. There are also initiatives on building multilingual platforms linking several wordnets, e.g. Open Multilingual Wordnet. Some wordnets are connected to Linked Open Data that creates a possibility for anchoring named entities, their relational descriptions and ontologies in the natural language lexical semantics. Thus, 1https://www.clarin.eu/event/2017/ clarin-workshop-towards-interoperability-lexico-semantic-resources</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>wordnets already form a rich set of lexical semantics resources that can be a case study in
building interoperable solutions in semantic multilingual lexical resources for the needs of
semantic interoperability on several levels: the infrastructure, meta-data, language tools and
research applications. However this potential has not been fully explored yet. The main
problems that we have identified before the workshop are:
• variety of formats for wordnets,
• small but significant diferences between wordnet models that results in problems of
mapping wordnet relations across language and makes their full use in multilingual
applications more dificult,
• lack of a common set of applications for wordnet across diferent languages, e.g. for
storing and accessing full content of wordnets (including their use as a reference set),
text-to-sense mapping, word and text similarity calculation, etc.</p>
      <p>The main challenges for wordnet developers seem to be:
• common formats for wordnets with their whole richness of semantic knowledge
represented and models applied,
• inter-operability on the level of the interpretation of wordnet models (correspondence
of basic elements and diferent types of relations),
• requirements for a basic set of wordnet-related tools, web services and applications,
• development of wordnet browsing services or inter-operable services,
• common framework for linking wordnets with Linked Open Data or domain
knowledge resources for the needs of applications,
• gathering and sharing good practices in utilising wordnets and similar semantic
resources in applications.</p>
      <p>The program for this workshop was a little unusual. First, we asked participants to read
the papers ahead of the workshop. Then the workshop itself was split into two sessions. In
the first session, each participant had 15 minutes to highlight one or two of the challenges
they considered most important, possibly with an outline of a solution. In the second session,
we summarized the first session and then followed up with a long moderated discussion with
all participating. Here we try to come up with a deeper understanding of the problems and
sketch some possible solutions. Brief notes from the summary are included in this preface.</p>
      <p>All the papers, the summary and notes from the discussions are online at: http://
clarin-pl.eu/workshop/wn-challenges.html. We would like to thank
CLARINPL2 research infrastructure and the Global WordNet Association Board for their support.</p>
    </sec>
    <sec id="sec-2">
      <title>Organization</title>
      <sec id="sec-2-1">
        <title>Organizers</title>
        <p>Maciej Piasecki, Wrocław University of Science and Technology
Francis Bond, Nanyang Technological University
Jan Wieczorek, Wrocław University of Science and Technology</p>
      </sec>
      <sec id="sec-2-2">
        <title>Program Committee</title>
        <p>Eneko Agirre, University of the Basque Country
Francis Bond, Nanyang Technological University
Sonja Bosch, University of South Africa, Pretoria
Christiane D. Fellbaum, Princeton University
Darja Fišer, University of Ljubljana
Antoni Oliver Gonzalez, Open University of Catalonia
Shu-Kai Hsieh, National Taiwan University
John P. McCrae, Insight Centre for Data Analytics, National University of Ireland,
Galway Verginica Mititelu, Romanian Academy
Monica Monachini, National Research Council of Italy
Adam Pease, Articulate Software
Bolette Sandford Pedersen, University of Copenhagen
Maciej Piasecki (chair), Wrocław University of Technology
Alexandre Rademaker, IBM Research
German Rigau, Polytechnic University of Catalonia
Ewa Rudnicka, Wrocław University of Technology
Shikhar Kr. Sarma, Gauhati University
Stanisław Szpakowicz, Emeritus Professor, University of Ottawa
Veronika Vincze, University of Szeged
Piek Th. J. M. Vossen, VU University Amsterdam</p>
        <p>Time
13:00‒13:15
13:15‒13:30
13:30‒13:45
14:30‒14:45
14:45‒15:00
15:30‒16:00
16:00‒18:40
18:40‒19:00</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Workshop Schedule</title>
      <p>Title
Welcome
How Stable are WordNet Synsets?
Inside Baseball: Coverage, quality, and
culture in the Global WordNet
Two corpus based experiments with the
Portuguese and English Wordnets
Overview and Future of Czech Wordnet
Challenges behind the data-driven Bulgarian
WordNet (BulTreeBank Bulgarian Wordnet)
The Revision History of Estonian Wordnet
Towards Revised System of Verb Wordnet
Relations for Polish
Classification of Adjectives in BulNet: Notes
on an Efort
The Concept of Lexical Platform</p>
      <sec id="sec-3-1">
        <title>Break Challenges for Wordnets: Summary and Discussion Conclusions, thank yous and farewells</title>
      </sec>
      <sec id="sec-3-2">
        <title>Presenter/Moderator</title>
        <p>Maciej Piasecki
Eric Kafe
Martin Benjamin
Fabricio Chalub, Alexandre
Rademaker and Cláudia Freitas
Adam Rambousek, Karel Pala and
Sandra Tukacova
Petya Osenova and Kiril Simov
Neeme Kahusk and Kadri Vider
Agnieszka Dziob, Maciej
Piasecki, Marek Maziarz,
Justyna Wieczorek and Marta
Dobrowolska-Pigoń
Tsvetana Dimitrova and Valentina
Stefanova
Maciej Piasecki, Tomasz
Walkowiak, Ewa Rudnicka,
Tomasz Naskręt and Francis Bond
Francis Bond (Moderator)
Francis Bond and Maciej Piasecki</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion Notes</title>
      <p>by Francis Bond (based on input from all present)
Outline These very rough notes are an attempt to list some of my (and Ed Hovy’s)
experiences; discuss a little what we are doing with the OMW; and summarize the main issues
raised by the various papers. Views given here are not necessarily shared by all the
participants (but we all seemed in general agreement). They were shown as slides to start the
discussion and have been reproduced here with minimal changes.</p>
      <p>• Some problems identified in Ontonotes
• Some problems identified by NTU-MC
• Some solutions with the OMW
• Summary
– Technical
– Social
– Financial
Ontonotes Problems
• Technical
• Social
• Missing senses</p>
      <p>– hard to add new terms
NTU Multilingual Corpus Problems
– Missing senses</p>
      <p>named entities
– Indistinguishable senses
hard to distinguish reliably
merged things until IA was 80%
hard to tell if important information was lost
– especially MWEs and cultural words
– non-english senses: gohan “cooked rice” kome “rice grains”
– want to add multiple languages at once
– small variations are common (sg/pl ’-/space’: night bird vs nightbird)
• Indistinguishable senses
– distinguishing information may not be seen easily
– verb frames, examples, network, …
– some things (40 or so) are no distinguishable — merge
• Missing POS: pronouns, classifiers, greetings, exclamations, …
OMW Solutions
• Link projects with a common shared structure
– document online and link to database (*)
– link through the CILI</p>
      <p>also to other resources (SUMO, …)
– allow multiple wordnets for the same language</p>
      <p>remove bottlenecks, encourage domain resources
– more information about senses and variants</p>
      <p>allow orthographic variants, label, e.g. scientific names
– link to corpora
– rank using sense frequencies/automatically created ontologies
– retain and publish statistics on usage and downloads (*)
– allow online feedback and comments (*)
– more tools for checking
definition checking, suggest synonyms, panlex, …
graph checks, corpus annotation, …
Permanent Identifiers
• Should we have sense-keys as well as synset identifiers?
• Will we use the lexicographer files any more?</p>
      <p>– Will PWN keep using these? (yes)
• How about version control for the whole wordnet?
can we standardize on major.minor.patch?
Quality Control
• How can we get feedback for input regarding quality and selection?
– user feedback
– expert feedback
• Can we link to live examples?
• Can we copy structure more easily from other wordnets, …
• What is the best way to build a wordnet?
– merge, transfer, some combination?</p>
      <p>automatic sources (from wiktionary, PanLex, …)
• Meanings change over time: how can we represent this?
Named Entities
• Should we have them or take them out?
• What about fictional names?</p>
      <p>– We need them in the Sherlock Holmes Corpus
• Can we get them from external resources
– Geonames
– Species
– DBpedia
– UMLS
Domains and Productive Meaning Extensions
• Domains
– Should we move them into sub wordnets?
– Or just add better domain markers?
– Should we keep supertypes (lexicographer files)?
• Productive Meaning Extensions
how should ee handle these
– quick to quickly
– actor to actress
– interest to interested
– clasp to unclasp
Definitions and Examples
• It is hard to write a good definition?
• Should it be unique?
Ontology
Compatibility
• Should it have redundant info from the links (e.g. domain)?</p>
      <p>possibly show in diferent interfaces
• Should it be disambiguated?
• What do we do if it changes?
• Can we use wordnet as an ontology?
• What are the diferences between WN and Ontologies?
• The same thing is called something diferent by diferent groups
• Several solutions to similar problems
– project to a shared vocabulary (OMW)
– document this (calling for volunteers)
– lexical gaps, phrasesets, artificial synsets,
• Mapping
– should we allow a variety of links to CILI?
– not just equal-synonymy
Social
• Restrictive licenses are restrictive
Yes
Yes
Yes
Yes
No
– Unless you can redistribute the data, it is hard to make a large compatible
collection, …
– Unless the licenses are compatible it is illegal to merge, …
• How do we get feedback from end users/applications?
e.g. If Kamusi adds something, how does it get back to the WN
• How do we get acknowledgments back to the original wordnets</p>
      <p>e.g. If someone looks it up in OMW/NLTK/Babelnet how is credit fed back?
• We don’t have standard tools that we all use
so duplication of efort
can we have a fixed data-specification so we can use any tool
Financial
• It is hard to get money to maintain wordnets</p>
      <p>– How can we make things easier to maintain?
• How can we persuade funders?
• How can we use funding/efort most efectively?</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <volume>13</volume>
          :
          <fpage>45</fpage>
          ‒
          <lpage>14</lpage>
          :
          <fpage>00</fpage>
          14:
          <fpage>00</fpage>
          ‒
          <lpage>14</lpage>
          :
          <fpage>15</fpage>
          14:
          <fpage>15</fpage>
          ‒
          <lpage>14</lpage>
          :
          <fpage>30</fpage>
          15:
          <fpage>00</fpage>
          ‒
          <lpage>15</lpage>
          :
          <fpage>15</fpage>
          15:
          <fpage>15</fpage>
          ‒
          <lpage>15</lpage>
          :30 •
          <article-title>How do we know what is happening with wordnets/roadmap? Should we discuss this</article-title>
          at GWC?
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>