<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>NLP applications: completing the puzzle</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ruben Izquierdo</string-name>
          <email>ruben.izquierdobevia@vu.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Vrije University of Amsterdam</institution>
          ,
          <addr-line>Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Natural Language and Computational Linguistics communities have traditionally faced di erent problems with speci c approaches and mostly in an isolated manner or in a pipeline way. The former approaches focus on solving one particular aspect of the Natural Language Processing without considering other problems, very easily ending up in incoherent solutions. Pipeline approaches tackle one problem at a time in a sequence of sub-problems, where the output of one step is the input of the next step. These methods su er from error propagation, tend to be too deterministic (one decision can not be changed later) and lead to sub{optimal solutions. To exemplify this problem, we include the Figure 1, where we show the result of an error analysis that we performed on the participant outputs from SensEval-2 to SemEval-2013. The table in that gure shows the error rate on the monosemous words, which were due mainly to part-of-speech errors (the tagger marks the word as adjective but it is a noun), or errors in the multiword detection (the systems tag \stuck" when they should tag \get stuck"). In SemEval2010 this error rate reaches 98%. More details on this error analysis can be found here [1].</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Description</p>
      <p>Another aspect that seems to be not fully considered is the role of the context.
For example, WSD systems usually restrict the context of a word to a very
narrow window of tokens around the target word, usually not bigger than the
sentence in which the token occurs. This is clearly not enough in some cases
where the clues for getting the proper meaning of the word are to be found in
another part of the document or even outside of this document (background
information). Following another example from the error analysis mentioned in
the previous paragraph, we include here a comparison of the average performance
of the systems on the cases where the most frequent sense applies, and in the rest
of cases. Results can be seen in Figure 2, where clearly the systems perform very
well on the most frequent cases but this performance drops dramatically in the
rest of cases. One reason could be that the systems are not modelling properly
the context and they are inducing just to apply the most frequent sense in all
the cases.
These issues are directly derived from the way that Natural Language
Processing has been considered and the way in which NLP applications have been
developed. These applications are framed mostly within computer science
frameworks, in which it is relatively easy to de ne a speci c task and an optimal
expected output, but this is not so trivial in NLP. We propose to see Natural
Language Processing as a big puzzle. The di erent tasks are small pieces that
must t perfectly in order to build an overall puzzle that represents the
interpretation of a document or a text. Following the puzzle analogy, the pieces can not
be considered in isolation. Moreover, sometimes external information is required
to complete the puzzle, as for example knowing what is depicted in the puzzle
to get clues about how to put the pieces together. Figure 3 shows the idea of
the puzzle, where every NLP task is a small portion of the puzzle where all the
pieces must t, but also the pieces of one task must t with the rest of puzzle
(rest of NLP tasks).</p>
      <p>Hence, the scope of the work is bringing together approaches that consider
in di erent ways the hypothesis presented previously. For instance, approaches
trying to solve several NLP task at the same time and mutually using the
information among the speci c subtasks to reach a good overall solution. Other
interesting research would be using external knowledge resources (such as
DBpedia, Wikipedia or the Web), in order to extract background and real{world
information that could be used to understand texts and solve NLP problems.</p>
      <p>This workshop has not been organized previously, but we think it deals with
very relevant topics, which are being currently faced in a large range of NLP
elds. It targets anybody working on Computational Linguistics and Natural
Language applications and concerned with the ideas and approaches presented
here. Some topics of interest could include among others:</p>
      <p>Papers submitted to this workshop should address some of these points:
{ Dealing with more than one NLP task
{ Using background information, external sources and Linked Data
{ Combining di erent external resources
{ Modeling the context considering scopes larger than the sentence
{ Processing multiple documents and linking information across them
{ In uence of the domain and building domain speci c resources to help NLP
applications</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Ruben</given-names>
            <surname>Izquierdo</surname>
          </string-name>
          , Marten Postma and
          <string-name>
            <given-names>Piek</given-names>
            <surname>Vossen</surname>
          </string-name>
          .
          <article-title>Error analysis of Word Sense Disambiguation</article-title>
          ,
          <source>In proceedings of CLIN2015: Computational Linguistics in The Netherlands, Antwerp, Belgium. February</source>
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>