-

NLP applications: completing the puzzle

Ruben Izquierdo

ruben.izquierdobevia@vu.nl 0 0 Vrije University of Amsterdam , Amsterdam , The Netherlands

The Natural Language and Computational Linguistics communities have traditionally faced di erent problems with speci c approaches and mostly in an isolated manner or in a pipeline way. The former approaches focus on solving one particular aspect of the Natural Language Processing without considering other problems, very easily ending up in incoherent solutions. Pipeline approaches tackle one problem at a time in a sequence of sub-problems, where the output of one step is the input of the next step. These methods su er from error propagation, tend to be too deterministic (one decision can not be changed later) and lead to sub{optimal solutions. To exemplify this problem, we include the Figure 1, where we show the result of an error analysis that we performed on the participant outputs from SensEval-2 to SemEval-2013. The table in that gure shows the error rate on the monosemous words, which were due mainly to part-of-speech errors (the tagger marks the word as adjective but it is a noun), or errors in the multiword detection (the systems tag \stuck" when they should tag \get stuck"). In SemEval2010 this error rate reaches 98%. More details on this error analysis can be found here [1].

Description

Another aspect that seems to be not fully considered is the role of the context. For example, WSD systems usually restrict the context of a word to a very narrow window of tokens around the target word, usually not bigger than the sentence in which the token occurs. This is clearly not enough in some cases where the clues for getting the proper meaning of the word are to be found in another part of the document or even outside of this document (background information). Following another example from the error analysis mentioned in the previous paragraph, we include here a comparison of the average performance of the systems on the cases where the most frequent sense applies, and in the rest of cases. Results can be seen in Figure 2, where clearly the systems perform very well on the most frequent cases but this performance drops dramatically in the rest of cases. One reason could be that the systems are not modelling properly the context and they are inducing just to apply the most frequent sense in all the cases. These issues are directly derived from the way that Natural Language Processing has been considered and the way in which NLP applications have been developed. These applications are framed mostly within computer science frameworks, in which it is relatively easy to de ne a speci c task and an optimal expected output, but this is not so trivial in NLP. We propose to see Natural Language Processing as a big puzzle. The di erent tasks are small pieces that must t perfectly in order to build an overall puzzle that represents the interpretation of a document or a text. Following the puzzle analogy, the pieces can not be considered in isolation. Moreover, sometimes external information is required to complete the puzzle, as for example knowing what is depicted in the puzzle to get clues about how to put the pieces together. Figure 3 shows the idea of the puzzle, where every NLP task is a small portion of the puzzle where all the pieces must t, but also the pieces of one task must t with the rest of puzzle (rest of NLP tasks).

Hence, the scope of the work is bringing together approaches that consider in di erent ways the hypothesis presented previously. For instance, approaches trying to solve several NLP task at the same time and mutually using the information among the speci c subtasks to reach a good overall solution. Other interesting research would be using external knowledge resources (such as DBpedia, Wikipedia or the Web), in order to extract background and real{world information that could be used to understand texts and solve NLP problems.

This workshop has not been organized previously, but we think it deals with very relevant topics, which are being currently faced in a large range of NLP elds. It targets anybody working on Computational Linguistics and Natural Language applications and concerned with the ideas and approaches presented here. Some topics of interest could include among others:

Papers submitted to this workshop should address some of these points: { Dealing with more than one NLP task { Using background information, external sources and Linked Data { Combining di erent external resources { Modeling the context considering scopes larger than the sentence { Processing multiple documents and linking information across them { In uence of the domain and building domain speci c resources to help NLP applications

Ruben

Izquierdo , Marten Postma and

Piek

Vossen . Error analysis of Word Sense Disambiguation , In proceedings of CLIN2015: Computational Linguistics in The Netherlands, Antwerp, Belgium. February 2015 .