<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>When it's all piling up: investigating error propagation in an NLP pipeline</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tommaso Caselli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Piek Vossen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marieke van Erp</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antske Fokkens</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Filip Ilievski</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ruben Izquierdo Bevia</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Minh Le</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roser Morante</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marten Postma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computational Lexicology and Terminology Lab The Network Institute VU University Amsterdam</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>We present an analysis of a high-level semantic task, the construction of cross-document event timelines from SemEval 2015 Task 4: TimeLine, to trace down errors to the components of our pipeline system. Event timeline extraction requires many di erent Natural Language Processing tasks among which entity and event detection, coreference resolution and semantic-role-labeling are pivotal. These tasks yet depend on other low-level analysis. This paper shows where errors come from and whether they are propagated through the di erent layers. We also show that performance of each of the subtasks is still insu cient for the complex task considered. Finally, we observe that there is not enough semantics and inferencing within the standard NLP techniques to perform well.</p>
      </abstract>
      <kwd-group>
        <kwd>NLP</kwd>
        <kwd>error analysis</kwd>
        <kwd>temporal event ordering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Textual interpretation requires many analyses ranging from tokenization,
PoStagging to high-level semantic tasks such as entity-detection or
semantic-rolelabeling (SRL). In Natural Language Processing (NLP), such tasks are usually
considered in isolation. However, the output of low-level analyses is used to
achieve higher levels of interpretations and it is well-known that their errors
propagate to higher levels. Furthermore, di erent modules may make incoherent
claims on the same text elements. A Named-Entity Recognition and Classi
cation (NERC) system may say that Ford is an entity of the type person, whereas
the entity linker nds the URI to the company and the SRL module assigns the
role of vehicle. Typical NLP architectures allow dependencies in one direction
only and do not have any mechanism to reconcile inconsistency. Therefore most
high-level modules are not designed to detect inconsistency and simply ignore
competing results from other modules. Especially when high levels of semantic
processing and interpretation are the goal, it becomes very di cult to relate the
performance of the system to the performance of each of the sub-modules and
to improve it.</p>
      <p>In this paper, we present an error analysis of a complex semantic task: the
SemEval 2015 Task 4: TimeLine: Cross-Document Event Ordering1 to learn more
about the dependencies between modules and missed opportunities. The most
relevant NLP sub-tasks for timeline extraction are entity and event detection,
coreference resolution, semantic-role-labeling and time expression recognition
and normalization. Timelines are created by combining the results of 10 modules
carrying out subtasks. This pipeline is an excellent case for demonstrating the
complexity of the process and the (non-)dependencies between the modules.
Our analysis starts from the errors creating the timelines and drills down to
lower levels that provide the necessary elements. We show that error propagation
from lower levels occurs, but its impact remains limited. Errors from high level
tasks piling up form the main cause of overall low performance. Our analyses
reveal that several errors can be avoided if information from various modules is
integrated better. However, only full comprehension of the context could yield
high results on this task.</p>
      <p>
        Our study di ers from previous work in that we quantify error dependencies
and shortcomings of a pipeline system rather than providing users with tools for
error tracking or component integration. To the best of our knowledge, the most
similar work we have identi ed in literature is Clark et al. (2007) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], where the
authors aimed at identifying what type of lexical knowledge is required to tackle
the Textual Entailment Task.
      </p>
      <p>The paper is further structured as follows. In Section 2 we describe the task
and the NLP pipeline that we used to participate. We give a detailed trace-back
analysis for the errors of our system in Section 3. We discuss these errors further
in Section 4 after which we conclude in Section 5.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Timeline Extraction</title>
      <p>
        The SemEval 2015 Task 4: TimeLine: Cross-Document Event Ordering aims
at identifying all events in a set of documents where a given entity plays the
PropBank [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] role Arg0 or Arg1. These events are then anchored to dates, where
possible, and ordered chronologically so as to create an entity-centric timeline.
A full description of the task can be found in Minard et al. 2015 [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>Figure 1 illustrates our pipeline indicating the dependencies among the di
erent modules.2 The pipeline modules mostly represent state-of-the-art systems,
some developed by third parties, and integrated in the NewsReader project.3
The rest of this section describes the main NLP modules of our system.
2.1</p>
      <sec id="sec-2-1">
        <title>Event Extraction and Entity-linking</title>
        <p>
          Event extraction and entity-linking is based on the output of the SRL module of
our pipeline. The predicates identi ed by the SRL module form candidate events.
1 http://alt.qcri.org/semeval2015/task4/
2 A detailed description can be found in Agerri et al. (2014) [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
3 http://www.newsreader-project.eu
The SRL module outputs PropBank roles, so we can directly identify relevant
event-entity links by selecting the Arg0 and Arg1 roles of the predicates.
        </p>
        <p>
          The SRL module we use is based on the MATE-tools [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. It takes tokens and
PoS-tags (which are used as features) as input and performs dependency parsing
and SRL in one joint step.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Entity Identi cation and Coreference</title>
        <p>The next step is to identify which events have an Arg0 or Arg1 that corresponds
to the target entity of a timeline. Three modules are involved in this sub-task: a)
NERC to identify and classify names, b) a NED to disambiguate named entities
by means of DBpedia URIs and c) coreference resolution to identify which other
expressions in the text have the same referent.</p>
        <p>
          The NERC module is a supervised machine learning system trained on the
CoNLL 2002 and 2003 shared tasks [
          <xref ref-type="bibr" rid="ref18 ref19">18, 19</xref>
          ]. It takes tokens, lemmas and PoS
tags as input and uses local features only. The outcome of the classi er is taken
as input for the NED module which is based on DBpedia spotlight.
        </p>
        <p>
          The coreference resolution module is a reimplementation of Stanford's Multi
Sieve Pass system [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. It is a rule-based system that uses lemmas, PoS-tags, the
identi ed entities and constituents.
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Event Coreference</title>
        <p>
          The event coreference module uses the predicates from the SRL layer. Predicates
with the same lemma in the same document are considered coreferential as well
as predicates with high similarity in WordNet [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
2.4
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Time Expressions</title>
        <p>
          We use FBK TimePro to identify time expressions in text. This machine learning
system is trained on TempEval3 data [20]. Time expressions are consequently
normalized using the timenorm library [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], which is integrated in the system.
FBK TimePro uses tokens, lemmas, PoS-tags, the entities identi ed by the
NERC module and constituents.
2.5
        </p>
      </sec>
      <sec id="sec-2-5">
        <title>Aggregating Timelines</title>
        <p>
          Each of the above modules generates an independent stand-o layer of
annotation in the NLP Annotation Format (NAF, [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]), which needs to be combined to
further generate timelines. For this purpose, the mentions of identical entities
and identical events are combined in a single representation, where the events
are anchored to time expressions or to the document publication date. Event
anchoring is based on a baseline system that links events to temporal expressions
mentioned in the same sentence or in one of the two preceding or one following
sentence. For all matched mentions of an entity (including coreferential phrases),
we consider the roles of all mentions of events in which they are involved and
output the event with the time-anchor or no time-anchor in the assumed proper
time-order. Likewise, our system potentially detects events for entities such as
Airbus in cases where the role is described through coreferential expressions,
such as they or the plane. In Table 1 we report the performance of our system
in the SemEval task and for four target entities4.
        </p>
        <p>Overall, performance is very low and varies a lot across the 4 entities.
Although it is a very di cult task, a deep error analysis is required to nd the
main causes for this performance, which will be presented in the next section.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Error Analysis</title>
      <p>To get more insight in the problems for this task, we want to answer the following
questions:
{ Which modules are responsible for most errors?
{ How are errors propagated to other modules in the pipeline?
{ To what extent do di erent modules make di erent statements on the same
interpretation?</p>
      <p>All of our modules have been benchmarked against standard datasets
obtaining state-of-the-art performances.5 However, what these gures do not tell
us is how errors propagate and whether a certain number of errors in di erent
modules make it impossible to derive the high-level structure of a timeline.</p>
      <p>
        We perform an in-depth error analysis of four target entities (Airbus,
Airbus380, Boeing and Boeing 777) by reversing the steps in the pipeline to nd the
module that breaks the result. We used false negatives (FN) and false positives
(FP) errors as starting point.
4 More details on the system can be found in Caselli et al. (2015) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
5 For more details on the benchmarking see Erp et al. (2015) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
3.1
      </p>
      <sec id="sec-3-1">
        <title>Events</title>
        <p>The analysis of events involved a total of 178 events for the FNs of all four
target entities and 209 events for the FPs of Airbus and Airbus A380. These two
entities are paradigmatic cases, i.e., the set of detected errors will apply to the
other entities as well.</p>
        <p>We identi ed 21 error types: 9 error types are due to speci c annotation
layers (e.g. SRL or Event detection), 9 are due to a combination of annotation
layers (e.g. SRL + Entity coreference), and 3 are not related to the pipeline
(e.g. Gold data or spelling errors). Table 2 provides an overview of the errors.</p>
        <p>False Negatives: The analysis has highlighted three major sources of
errors all related to high-level semantic tasks, namely Event detection (39 events);
Entity coreference (36 events) and SRL (23 events).</p>
        <p>Error propagation from low-level annotation layer a ects event detection for
only 8 cases (i.e. wrong PoS tagging). The majority of cases is directly a ected
by the SRL layer with 11 instances of partial detection (i.e. multi-tokens events)
and 24 cases of events realized by nouns.</p>
        <p>Entity coreference a ects event extraction indirectly as the target events are
detected but they cannot be connected to the target entities.</p>
        <p>Finally, the SRL module mainly introduces two types of errors: wrong role
assignment or implicit arguments in the predicate structure.
a It contains 1 case due to wrong PoS tagging.
b It contains 2 cases of partial detection and 1 case due to wrong PoS tagging.
c It contains 6 cases of partial detection and 3 cases due to wrong PoS tagging.
d It contains 3 cases of partial detection and 3 cases due to wrong PoS tagging.
e All cases include also errors with Final Entity Extraction.</p>
        <p>False Positives: These errors point to issues related to over-generation of
events and of event entity linking. Most errors concern Event coreference (39
cases, plus 10 cases in combination with other modules) and Event detection (15
cases, plus 43 in combination with other modules).</p>
        <p>The event coreference errors point to both a limit of the speci c annotation
layer and to a propagation of errors from the SRL module.</p>
        <p>
          The SRL module responsible for event detection does not only miss most
nominal events, it also overgenerates them. This is due to the implementation
of the SRL module which labels all entries in NomBank [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] as events. This
assumption is valid in a broad perspective on argument structures, but not all
nouns with argument structures have an event reading.
        </p>
        <p>The Final Entity Extraction refers to errors in entity matching for the
nal timeline. The high number of errors for Airbus A380 (77) is caused by the
substring method used as a back-o strategy.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Semantic Role Labeling</title>
        <p>We analyzed the SRL for 60 events in the gold timeline for the entity Airbus.
Table 3 provides an overview of our ndings.</p>
        <p>Events in gold timeline (gold events) 60
Gold events in the SRL layer 54
Gold events in out-of-competiton timeline 25
Gold events not in out-of-competiton timeline 29
Correct SRL assignment in SRL layer 41
Correct SRL assignment not in out-of-competiton timeline 19
Correct SRL assignment in out-of-competiton timeline 22
Incorrect SRL assignment in SRL layer 15
Incorrect SRL assignment not in out-of-competiton timeline 10
Incorrect SRL assignment in out-of-competiton timeline 5</p>
        <p>Events in out-of-competiton timeline 103</p>
        <p>The SRL layer contained 54 out of 60 gold events. From these 54 events, 41
had a correct SRL assignment. However, only 22 out of these 41 events ended
up in the system's timeline. We distinguish two groups among the 19 cases of
correct SRL that did not appear in the timeline:</p>
      </sec>
      <sec id="sec-3-3">
        <title>Entity in explicit semantic role (11 cases) We found 4 cases where the</title>
        <p>NED system failed to link the named entity Airbus to the correct DBpedia URI.
In 2 cases, Airbus was not recognized as the head of the NP. In the other 5 cases,
the error is caused by the coreference module.</p>
      </sec>
      <sec id="sec-3-4">
        <title>Entity in implicit semantic role (8 cases) Implicit roles [10, 17] are roles</title>
        <p>that are not instantiated in the clause, but can be recovered from the context.
In 4 cases, the role could be inferred from information in the same sentence, and
4 cases required previous context. These 8 errors clearly originate in the SRL
layer but are not errors of the SRL module, since the SRL assignment is correct
according to the state-of-the-art tools.
3.3</p>
      </sec>
      <sec id="sec-3-5">
        <title>Entities</title>
        <p>Named entities can be mentioned explicitly in text or implicitly through
coreferential chains. We analyzed both cases by inspecting the entities which appear
in the roles Arg0 and Arg1 of the event predicates. Explicitly mentioned named
entities are analyzed using the following steps:
1. Did disambiguation (NED) fail or recognition (NERC)?
2. If disambiguation failed, is the entity recognition incorrect? This includes
both missed entities and incorrect spans (e.g. Airbus to instead of Airbus)
3. If recognition failed, are there problems with the PoS, lemmas or tokens?</p>
        <p>
          For each coreferencial mention, we traced back the correct coreference chain(s).
We identi ed two types of errors: missing coreference link and wrong coreference
link. For each case, we also inspected whether the coreferential problem is caused
by a wrong PoS tag. Overall, 16 cases were identi ed where the anaphoric
mention was realized by a pronoun not linked to the correct antecedent. In 20 cases,
the anaphoric element was realized by a Full De nite Noun Phrase (FDNP) of
the form \the + (modi er) + noun". An in-depth analysis of these cases, in
the line of Poesio and Vieira (1998) [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], showed that to solve such coreferential
chains world knowledge related to the target entities is required. For instance,
to identify that the aircraft corefers with its antecedent Airbus A380, knowledge
about the fact that Airbus A380 is an (instance of) of an aircraft is required.
        </p>
        <p>Results are reported in Table 4. The numbers for coreference relations only
concern direct links between a Named Entity and an anaphoric mention. Cases
of implicit arguments, or zero anaphora, are not reported in the table.
Normalized time expressions are needed to anchor and order events on the
timeline. We analyzed the system's performance on the events in the gold timelines
as well as the events identi ed by our pipeline for Airbus and Airbus 380.</p>
        <p>The gold standard of Airbus and Airbus A380 contains 103 events that need
to be linked to a speci c date. The system has no mechanism for disambiguation
and extracted 208 links. There were 21 extracted events linked to a speci c date
and 91 links found. Unsurprisingly, time anchoring has decent recall and low
precision: Recall was 59.2% on gold events and 71.4% on the extracted events.
Precision was 30.5% on gold and only 16.4% on the extracted set.</p>
        <p>Table 5 provides an overview of the outcome of our analysis. The rst column
provides the number of correctly identi ed dates. We found ve di erent errors
and a handful of cases where the date should be identi ed by event coreference
(i.e. not a time anchoring error). The rst two errors are related to identifying
the correct date. FBK TimePro identi ed the wrong span leading to 32 errors.
These errors occur despite the constituents being correct. The error propagates
up to timenorm which provides the wrong interpretation of the date. Timenorm
furthermore systematically places underspeci ed dates in the past, so that all
future events are placed in the wrong year (37 errors).</p>
        <p>The next two errors concern the correct link between a date and an event.
Finally, there are interpretation errors, where event and date are indeed
connected, but the relation is not an anchoring one (but, e.g., a preceding relation).
Among them, 12 cases can only be solved by world knowledge or interpretation
of context. Multiple errors can occur around the same date or event, e.g., a date
may be placed in the wrong year and be wrongly linked to the event.</p>
        <p>Overall, the system is quite good at identifying time expressions and
interpreting them. The main challenge lies in linking events with the correct date.
This is also re ected by 13 cases where the Gold Standard seemed incorrect.
With everything piling up, our system performs much worse with only 15 out of
108 events identi ed and anchored correctly.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion</title>
      <p>The analyses show that error propagation is an issue which mainly starts from
high-level tasks such as entity detection and linking, event detection and SRL.
The impact of error propagation from low-level layers appears to be minimal in
our data. For example, we noted that the PoS-tagger correctly detects proper
nouns, but the NERC module still fails.</p>
      <p>Another conclusion is that modules require more semantic analysis and
inferencing. This speci cally applies to coreference and implicit arguments of events.
Possibly, this requires richer semantic resources and broadening the context of
analysis rather than limiting it to local and structural properties. In many cases,
participants of events are mentioned in other sentences than the one where the
event occurs, which is missed by sentence-based SRL. Semantics can also
exploit word-sense-disambiguation (WSD) for coreference and event detection. For
event coreference, the highest similarity is currently taken across all meanings
of predicates, connecting the same predicates in di erent meanings across the
text. Restricting the similarity to dominant meanings appears to eleminate false
positives of event coreference links with higher f-measures. The opposite holds
for nominal coreference that fails to nd bridging relations across NPs and
entities (false negatives) due to lack of semantic connectivity. Widening reference
to capture metonymy, e.g. Airbus A380 implies Airbus, requires more elaborate
semantic modeling. With respect to event detection, WSD can eliminate false
positive nominals that are not events in the dominant meaning.</p>
      <p>
        Finally, temporal processing must be improved, with a particular focus on
temporal anchoring. Time expressions are sparse, but they are the most
reliable source of information for anchoring events and provide partial ordering.
Additional research based on the notion of temporal containers [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] is needed
as temporal anchoring remains challenging even with perfect detection. Richer
temporal resources are needed to improve the detection of explicit and implicit
temporal relations between events. The order in which events are mentioned can
provide a baseline, but its performance depends on the text type.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Future Work</title>
      <p>We provided a detailed error analysis of a NLP pipeline system that took part
in the SemEval 2015 Task 4: TimeLine. We have shown how the system's poor
performance is in part due to errors in di erent modules of the pipeline (e.g.
SRL, NERC and NED) and partially to lack of rich semantic models in current
state-of-the-art systems.</p>
      <p>
        An interesting aspect is the relatively low error propagation from low-levels to
high-levels. This shows that most basic tasks such as tokenization, PoS-tagging
and lemmatization achieve reliable performance and could almost be considered
as solved. We say almost, because errors in these layers do propagate to high-level
layers (also see Manning (2011) [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]).
      </p>
      <p>Semantic tasks, such as NERC, NED, event detection, SRL and temporal
processing, are far from being solved. The mismatch in performance between
benchmark datasets and the task points out that more research and new solutions
are required. We found evidence that some sub-tasks are deeply interrelated (e.g.
event detection and SRL; NERC, NED and entity coreference) and require better
integration to boost performance. Furthermore, some errors can only be avoided
by rich semantic models interpreting context.</p>
      <p>The complex task of timeline construction will rst of all greatly bene t from
dedicated modules for event detection: the presence of an argument structure
does not necessarily correlate with event readings. Secondly, coreference
resolution needs to be improved through more elaborate semantic modeling. Finally,
no module uses the output of the WSD module despite evidence that a more
optimal selection of senses will help their task. Future work will concentrate on
these pending issues.
20. UzZaman, N., Llorens, H., Allen, J., Derczynski, L., Verhagen, M., Pustejovsky,
J.: SemEval-2013 task 1: Tempeval-3: Evaluating time expressions, events, and
temporal relations. In: Proceedings of SemEval-2013. pp. 1{9 (2013)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agerri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aldabe</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beloki</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laparra</surname>
            , E., de Lacalle,
            <given-names>M.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rigau</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soroa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fokkens</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Izquierdo</surname>
            , R., van Erp,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vossen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girardi</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Minard</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          :
          <article-title>Event detection, version 2</article-title>
          .
          <source>NewsReader Deliverable 4.2.2</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bethard</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A Synchronous Context Free Grammar for Time Normalization</article-title>
          .
          <source>In: Proceedings of EMNLP 2013</source>
          . pp.
          <volume>821</volume>
          {
          <issue>826</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Bjorkelund,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Bohnet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Hafdell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Nugues</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.:</surname>
          </string-name>
          <article-title>A high-performance syntactic and semantic dependency parser</article-title>
          .
          <source>In: Proceedings of the 23rd COLING: Demonstrations</source>
          . pp.
          <volume>33</volume>
          {
          <issue>36</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Caselli</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fokkens</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morante</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vossen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : SPINOZA VU:
          <article-title>An NLP Pipeline for Cross Document Timelines</article-title>
          .
          <source>In: Proceedings of SemEval-2015</source>
          . pp.
          <volume>786</volume>
          {
          <issue>790</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harrison</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thompson</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murray</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hobbs</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fellbaum</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>On the role of lexical and world knowledge in rte3</article-title>
          .
          <source>In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing</source>
          . Prague (
          <year>June 2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Erp</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vossen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agerri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Minard</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Speranza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Urizar</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laparra</surname>
          </string-name>
          , E.:
          <article-title>Annotated data, version 2</article-title>
          .
          <source>NewsReader Deliverable 3.3.2</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Fellbaum</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : WordNet. Wiley Online Library (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Fokkens</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soroa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beloki</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ockeloen</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rigau</surname>
            , G., van Hage,
            <given-names>W.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vossen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>NAF and GAF: Linking linguistic annotations</article-title>
          .
          <source>In: Proceedings 10th Joint ISO-ACL SIGSEM Workshop</source>
          . p.
          <volume>9</volume>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Gerber</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chai</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meyers</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>The Role of Implicit Argumentation in Nominal SRL</article-title>
          .
          <source>In: Proceedings of the NAACL HLT</source>
          <year>2009</year>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Gerber</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chai</surname>
          </string-name>
          , J.:
          <article-title>Semantic Role Labeling of Implicit Arguments for Nominal Predicates</article-title>
          .
          <source>Computational Linguistics</source>
          <volume>38</volume>
          (
          <issue>4</issue>
          ),
          <volume>755</volume>
          {
          <fpage>798</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peirsman</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chambers</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Surdeanu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jurafsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Stanford's multi-pass sieve coreference resolution system at the CoNLL-2011 shared task</article-title>
          .
          <source>In: Proceedings of CoNLL</source>
          <year>2011</year>
          :
          <article-title>Shared Task</article-title>
          . pp.
          <volume>28</volume>
          {
          <fpage>34</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Manning</surname>
          </string-name>
          , C.D.:
          <article-title>Part-of-speech tagging from 97% to 100%: Is it time for some linguistics</article-title>
          ? In: Proceedings CICLing'
          <fpage>11</fpage>
          - Volume Part I. pp.
          <volume>171</volume>
          {
          <fpage>189</fpage>
          . SpringerVerlag, Berlin, Heidelberg (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Minard</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Speranza</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agirre</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aldabe</surname>
            , I., van Erp,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Magnini</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rigau</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Urizar</surname>
          </string-name>
          , R.: SemEval
          <article-title>-2015 Task 4: TimeLine: Cross-Document Event Ordering</article-title>
          .
          <source>In: Proceedings of SemEval-2015</source>
          . pp.
          <volume>777</volume>
          {
          <issue>785</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Palmer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gildea</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kingsbury</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>The Proposition Bank: A Corpus Annotated with Semantic Roles</article-title>
          .
          <source>Computational Linguistics Journal</source>
          <volume>31</volume>
          (
          <issue>1</issue>
          ) (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Poesio</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vieira</surname>
            ,
            <given-names>R.:</given-names>
          </string-name>
          <article-title>A corpus-based investigation of de nite description use</article-title>
          .
          <source>Computational linguistics 24(2)</source>
          ,
          <volume>183</volume>
          {
          <fpage>216</fpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Pustejovsky</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stubbs</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Increasing informativeness in temporal annotation</article-title>
          .
          <source>In: Proceedings of the 5th Linguistic Annotation Workshop</source>
          . pp.
          <volume>152</volume>
          {
          <issue>160</issue>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Ruppenhofer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee-Goldman</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sporleder</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morante</surname>
          </string-name>
          , R.:
          <article-title>Beyond sentencelevel semantic role labeling: linking argument structures in discourse</article-title>
          .
          <source>Language Resources and Evaluation</source>
          <volume>47</volume>
          (
          <issue>3</issue>
          ),
          <volume>695</volume>
          {
          <fpage>721</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Tjong Kim Sang</surname>
            ,
            <given-names>E.F.</given-names>
          </string-name>
          :
          <article-title>Introduction to the CoNLL-2002 shared task: Languageindependent named entity recognition</article-title>
          .
          <source>In: Proceedings of CoNLL 2002</source>
          . pp.
          <volume>142</volume>
          {
          <issue>147</issue>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Tjong Kim Sang</surname>
            ,
            <given-names>E.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Meulder</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition</article-title>
          .
          <source>In: Proceedings of CoNLL 2003</source>
          . pp.
          <volume>142</volume>
          {
          <issue>147</issue>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>