=Paper=
{{Paper
|id=Vol-3671/paper7
|storemode=property
|title=Untangling a Web of Temporal Relations in News Articles
|pdfUrl=https://ceur-ws.org/Vol-3671/paper7.pdf
|volume=Vol-3671
|authors=Purificação Silvano,Evelin Amorim,António Leal,Inês Cantante,Alípio Jorge,Ricardo Campos,Nana Yu
|dblpUrl=https://dblp.org/rec/conf/ecir/SilvanoALCJ0Y24
}}
==Untangling a Web of Temporal Relations in News Articles==
Untangling a Web of Temporal Relations in News
Articles
Purificação Silvano1,⇤ , Evelin Amorim2 , António Leal1 , Inês Cantante1 , Alípio Jorge2,3 ,
Ricardo Campos2,4 and Nana Yu1
1
University of Porto/ CLUP, Porto, Portugal
2
INESC TEC, Porto, Portugal
3
University of Porto, Porto, Portugal
4
University of Beira Interior, Covilhã, Portugal
Abstract
Temporal reasoning has been the focus of several studies during the past years, both in linguistics and
computational studies. Although advances on this topic are undeniable, there are still improvements
to be made and new avenues to pursue. One relevant problem concerns the temporal ordering of the
events, particularly asserting and representing how events are temporally related and how the story
told in the narrative evolves. This paper aims to analyse the temporal structure of narratives present
in news articles with the aid of different visualisations. To this end, we annotated a dataset of 119
news articles in European Portuguese following an annotation scheme that combines different parts of
ISO 24617-Language Resource Management - Semantic Annotation Framework (SemAF). The temporal
layer of this annotation scheme identifies the events and their main features, as well as the temporal
links between the events. The annotation provided us with paramount information about the temporal
characteristics of news at two levels: the story and the report levels. The visualisations that we propose
facilitate the process of understanding how news are temporally organised, providing a more practical
means to observe them.
Keywords
temporal structure, narrative news, annotation, visualisations
1. Introduction
Understanding the temporal dimension of a text requires more than merely identifying events
and temporal expressions. Eventually, this understanding of text requires building structured
information about events and inferring their temporal relations before constructing a timeline
of events. Such tasks are still very challenging in Natural Language Processing (NLP) and
In: R. Campos, A. Jorge, A. Jatowt, S. Bhatia, M. Litvak (eds.): Proceedings of the Text2Story’24 Workshop, Glasgow
(Scotland), 24-March-2024
⇤
Corresponding author.
� msilvano@letras.up.pt (P. Silvano); evelin.f.amorim@inesctec.pt (E. Amorim); jleal@letras.up.pt (A. Leal);
cantante.ines@gmail.com (I. Cantante); amjorge@fc.up.pt (A. Jorge); ricardo.campos@inesctec.pt (R. Campos);
robertananayu@hotmail.com (N. Yu)
� https://www.purisilvano.pt (P. Silvano); https://github.com/evelinamorim (E. Amorim);
http://www.ccc.ipt.pt/~ricardo/ (R. Campos)
� 0000-0001-8057-5338 (P. Silvano); 3C18-B766-4A42 (E. Amorim); 0000-0002-6198-2496 (A. Leal);
0009-0002-3866-4550 (I. Cantante); 0000-0002-5475-1382 (A. Jorge); 0000-0002-8767-8126 (R. Campos);
0000-0003-4378-088X (N. Yu)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
77
Information Retrieval (IR), and they can surely benefit from the input of linguistic analysis.
Currently, temporal extraction comprises three phases: (i) recognition of events and temporal
expressions; (ii) recognition of temporal relations between them; and (iii) time-line constructions
based on the temporal relations [1]. The first phase has been widely researched and carried out
with a high degree of success (cf.[2] for the state-of-the-art about temporal-related identification
and extraction tasks). However, the last two phases are more complex, and although some
research has produced encouraging results (cf.[2] for an overview), many problems persist,
requiring further work.
Determining the temporal organisation of events becomes even more challenging when
dealing with texts that present events in a non-chronological order, like news articles, which
display an intricate temporal structure compared to other types of narratives [3]. It is widely
accepted that news stories are narratives, and as such, they can be analysed within the conceptual
framework of narratology [4]. However, contrary to other narratives, not only is there a different
linear arrangement of the narrative components, but also a reappearance or ’recycling’ of the
most critical aspects of the story throughout the news text [5], which has implications in the
news temporal structure.
Moreover, as demonstrated by [6] in news articles, one frequently finds two narratives: the
narrative of reporting the story targeted by the news and the story itself composed of the
reported events. Reported speech is a common technique used in news writing [7]. Journalists
use direct quotations to reproduce what others have said and indirect quotations to describe
what others think or say. This method is crucial when writing news, as reporters usually report
on what their sources of information have to say about what they have seen or know. Reported
speech and quotation are often treated in many studies as attribution relations [8], that is,
connections between pieces of information and the sources that express them. Thus, news can
have two levels of narrative: one that explains the events that make up the story being narrated
(what happened, where, when, and why), which is the sequence of events that is the news topic,
and another that describes the sources that provided the information to the journalist who
wrote the news (who said what to the journalist). These two levels appear in textual sequences
that alternate throughout the news text. Separating the two narratives is crucial in determining
the chronological order of the story’s events.
Regarding time-line constructed based on temporal relations, researchers have explored
different formats. For instance, [9] and [10] used Message Sequence Charts (MSC) to represent
events and their relations in a temporal order. [6] used a visual representation that employs
an analog clock to identify reporting events, the sources and their nested events. Regarding
visualisations devoted to the general public, most of proposals focus on infographics, and
timelines [11, 12]. Regardless of the format, visualisations are quite useful, because, besides
determining the temporal order of events automatically, the manual inspection of the temporal
arrangement of events can be done to get insights quickly or perform deep analysis about some
structure. However, for most cases, such visualisations lack more specific information about
the events, which can be provided by manual annotation. This kind of information is essential
to make a deeper analysis of the narrative temporal structure.
For this study, we defined three objectives. First, we aim to analyse the intricate relations
between the events throughout the narrative of the news annotated according to ISO-24617-1
[13]. Second, we work towards assessing how the story events are organised within each report
78
and across the reports. Finally, we seek to determine associations between the temporal relations
and some of the grammatical features of the events. The representation of the temporal relations
in two formats, the Bubble data structure, proposed in [6] and the Message Sequence Chart [9],
enables a swift and time-saving analysis of temporal features of news articles.
The paper is organised as follows. Section 2 reviews some proposals about the temporal
structure of news. Section 3 describes the study, beginning with the description of the problem
and the research questions (3.1), followed by relevant information about the dataset and the
annotation scheme (3.2) and the study’s methodology (3.3). Subsection 3.4 reports on the results
of our study. Finally, Section 4, presents the overall conclusions.
2. Temporal structure of news articles
From a solely linguistic approach, several research works have yielded significant findings
regarding news temporal structure [4, 5, 14, 15, 16]. News articles are described as starting with
the main event in the lead and going back to earlier events, and presenting details in instalments
[14] in the body of the news, in a cyclic or “zigzagging pattern, with the time-line repeatedly
moving into the past and the future concerning the main event” [5]. For this reason, several
analyses describe news stories as non-chronological “at odds with the linear narrative point”
[4]. However, [15] argues that Bell’s analysis of news as non-chronological may be too hasty
because a closer examination of news temporal structure from a linguistic perspective unravels
the matching between the discourse structure and the underlying event structure. The author
presents evidence that events are frequently told in the order they occur. Whenever they are
not, the underlying order of the events can still be interpreted due to some linguistic devices,
contrary to what would happen if a story were truly non-chronological. In fact, a controlled
experiment with three subjects conducted by [17] disclosed that humans were quite capable of
untangling the order of the events.
From a computational point of view, some studies propose annotation schemes, methods and
techniques to identify and retrieve temporal information from news [18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28]; cf. also [29] for an overview of current temporal annotation schemes and corpora).
Naturally, regardless of all the progress, many problems persist (cf. [30] for an overview), mainly
because determining the chronology of events depends on multiple factors, such as linear order
of discourse, tense, temporal expressions, sentence meaning, and aspect. As discussed by [17],
although a computer easily computes the first three, the last two are more difficult to process.
Moreover, the complexity of temporal information retrieval of news is amplified by the
presence of reports with the mention of sources or attributions, which create a second narrative
layer, the narrative of the report. Each report comprises several events from the story [6]
temporally linked within and across the different reports. Identifying and extracting temporal
relations among the events that compose the story can be a troublesome endeavour, let alone
establishing all the temporal links between the story events and those that report the storyline.
To the best of our knowledge, none of the existing studies takes on the annotation, analysis,
and representation of the temporal relations between the two layers of telling a story in news
articles: the narrative of the report and the narrative of the story. Nonetheless, such a study can
produce relevant input for the task of temporal information retrieval.
79
3. The study
3.1. Problem and research questions
In this study, our objective is to characterise the temporal structure of narrative news. However,
since news comprise two intertwined levels, the level of the story being told and the level of
the report that tells that story [6] it is our aim not only to determine the temporal structure of
the story but also to establish the role of the different report blocks in its temporal organisation.
Furthermore, with this study, we seek to ascertain which main features of events are associated
with the different temporal relations, that is, to figure out if one can relate a specific temporal
relation between two events to certain grammatical characteristics of those events. To this end,
we formulated three research questions:
RQ1: How are the events temporally organised at the story level?
RQ2: What is the report level’s role in the temporal organisation of the story?
RQ3: What is the association between the temporal relations and some of the events’ gram-
matical features?
The three research questions were created to help define the typical structure of news, and,
ultimately, assist in the development of automatic methods for information extraction. The first
question aims to define the prototypical temporal structure of news and provide important input
for automatic forms of temporal analysis. Extracting temporal information from unstructured
data can greatly benefit journalists and other stakeholders. However, it is a challenging task.
Creating timelines from text is relatively easy if temporal expressions unambiguously locate
situations. But it becomes more complicated if the text lacks temporal expressions. Therefore,
understanding how situations are organised throughout the news can help with automatic
extraction. The second question is related to the first one because to extract information and
organise it temporally, we need to distinguish between situations that are part of the story
being narrated and those that provide information about the sources used by the journalist to
write the news. Both types of situations occur alternately in the news, making it difficult to
automatically classify them without some linguistic input. The third question aims to deepen
our understanding of news events characteristics, which will impact both theoretical linguistics
and computer science. Determining the linguistic features of events is essential for automatic
methods of extracting and organising information, particularly in cases where there are no
explicit temporal adverbials or discourse markers. For example, in the sequence of sentences
"The boys played football. The girls swam in the pool.", the preferred temporal connection
between the situations is simultaneity, whereas in "The boys broke the window glass. The girls
tore the curtain.", the preferred temporal relationship is successivity. Understanding which
grammatical elements contribute to these temporal relations is key to temporal information
extraction.
80
3.2. Dataset and annotation
For this study, we utilized the T2S Lusa Annotated Dataset 1 , which is a set of 119 news texts
retrieved from the T2S Lusa dataset 2 . The T2S Lusa dataset contains 360 news articles collected
from the general news feed of Lusa, a Portuguese news agency. First, the team of linguists
conducted a preliminary manual analysis of a small set of Lusa news, and identified a set
of keywords that were more frequent in the news with a narrative nature, such as "assaults",
"robberies", "accidents", or "police interventions". Afterwards, Lusa collected the news containing
these keywords with a length restriction of 50 to 200 words. Finally, the linguists manually
checked this collection, resulting in a corpus of 360 news articles.
The data was annotated following a previously designed multilayered annotation scheme,
which was built up in such a way that it combines four parts of ISO-24617 Language resource
management: Part 1 – Time and Events [13]; Part 4 – Semantic Roles [31]; Part 7 – Spatial
Information [32]; Part 9 – Reference Annotation Framework [33] (cf.[34] and [35]). It comprises
two types of structures: entity structures (events, times, participants, measures, and spatial
relations) and link structures (temporal, aspectual, spatial, subordination, objectal, and semantic
role links).
The focus of the current study is on tags that describe events and temporal links. Events
are annotated to identify eventualities (i.e. events and states) and their specific characteristics
using the Event tag, which specifies values for Class, Type, Part of Speech, and Tense. Class
identifies some characteristics of the lexical semantics of verbs. For example, the Reporting
value is given to events in the case of verbs that denote a situation in which an entity reports a
story or provides information about a particular situation. Type is related to the Aspect of the
situation, which determines its internal temporal structure, and can be a State (a situation in
which something obtains or holds), a Process (a durative and atelic situation), or a Transition (a
situation introducing a consequent state). Temporal links (TLinks) are used to identify temporal
relations between events, which are critical in narrative texts for determining the chronological
order of the events. TLinks have the following values: Before, After, Includes, Is_Included,
During, Simultaneous, Identity, Begins, Ends, Begun_By, and Ended_By.
The dataset of 119 news articles was annotated by a PhD student in Linguistics who collabo-
rated in the development of the annotation scheme. The annotator discussed problematic cases
with a team of linguists before carrying out the annotations. To ensure the accuracy of the
annotations, a second annotator followed the guidelines of the annotation manual and annotated
a sample of 10% of the dataset (19 news articles) to test inter-annotator agreement. Regarding
the inter-annotator agreement, first, we measured the agreement of the events labelling. For
this task, the agreement between the two annotators was computed as a pairwise f1. The
choice for the f1 instead of Cohen’s Kappa in the event labelling is due to the high number of
non-labelled tokens (disregarded for this particular study), which can raise the kappa score
disproportionately [36, 37, 38]. The results reveal that the pair-wise f1-score was 0.77, which
is substantial agreement. With respect to the attributes, we computed the classical Cohen’s
Kappa score for the attributes Class (0.63), Type (0.51), Part of Speech (0.81), and Tense (0.74),
the attributes relevant to the present study, where the agreement is also substantial, except
1
https://rdm.inesctec.pt/en/dataset/cs-2023-018
2
https://rdm.inesctec.pt/dataset/cs-2023-015
81
the agreement for Type - probably the most challenging -, which is moderate. Concerning
the temporal links between the events, Cohen’s Kappa was 0.31, and the agreement of the
attributes of the temporal links resulted in a Cohen’s Kappa value of 0.32, both considered fair.
The difficulty of the task can explain these lower numbers. In future work, we will perform
more annotation experiments to explain the reasons for this discrepancy.
3.3. Methodology
To answer our first research question (How are the events temporally organised at the story
level?), we focused on the temporal relations (TLinks) between all the events that comprise the
story, excluding the reporting events belonging to the report level, which the TLink Identity
connected. This strategy excluded the reporting events that are part of the second level of
discourse. As explained in [6], the reporting events that compose this second level of discourse
are classified as belonging to the class Reporting and are linked by the TLink Identity because,
in a piece of given news, they integrate the same report carried out by different reporting events.
Some examples are verbs such as informou (�informed�), declarou (�declared’), but also segundo
(’according to’) + noun phrase or de acordo com (’according to’) + noun phrase, also markers of
reporting events. Accordingly, when analysing the temporal relations between the story events,
in an example like the one in A.1, we excluded the events classified as reporting (signalled in
bold red) and linked among them by TLink Identity.
For our second research question (What is the report level’s role in the story’s temporal or-
ganisation?), we aimed to determine the reports’ role in organising the story’s timeline. We
collected two types of information: (i) TLinks between all the events within each report; and (ii)
the TLinks between the first story event of a report block B and any story event from the report
block A. The example shown in Appendix A.1 illustrates this procedure. There are five blocks,
represented in the Bubble structure A.2.1: the first and the second reports are introduced by the
reporting verb disse (’said’), the third by the reporting event acrescentou (’added’), the fourth
by afirmou (’stated’) and the last one by indicou (�pointed�). These five blocks of the report
form the report level. Each block, represented by a Bubble, includes multiple story events. So,
first, we extracted all the TLinks connecting these events. Second, we extracted all the TLinks
between the story events embedded in the different Bubbles.
For our third research question (What is the association between the temporal relations and
some of the events’ grammatical features?), we thoroughly analysed the features of the events
that form the story level. We aimed to uncover the connection between the grammatical
characteristics of the events and each temporal relation. To achieve this, we collected all the
relevant information from the tags Class, Type, Part of Speech (PoS), and Tense of the events
linked by each of the TLinks likely to impact temporal relations.
The extraction and visualisation of these data were made possible by the package text2story3 .
This is a Python package devoted to automatically extracting narratives programmatically and
easily. Additionally, the package offers three types of visualisation for narratives: Knowledge
Graph (KG), Message Sequence Chart (MSC), and Bubble Diagram (BM). In this study, we
only employ the MSC and BM to visualise some news story elements. The MSC represents
3
https://pypi.org/project/text2story/
82
the events that form the story level as lifelines, i.e., the coloured rectangles in sequence with
lines underneath. These events can be linked by TLinks that were annotated in the text. The
sequence order of events in the MSC is the same order that they appear in the text. This
arrangement allows more careful analysis of such types of events than if this information was
in the annotation tool. One example can be seen in Figure 2 in Appendix A.2. Differently, the
Bubble Diagram’s main goal is to represent the reporting events and each event related to the
story attached to them. In this diagram, there is the Big Bubble, which represents a reporting
event. Inside each Big Bubble are little bubbles, representing events at the story’s level. The
bubbles, the big ones, and the little ones are sorted clockwise, i.e., the first event that appears in
the text goes in a noon position, the second appears in a clockwise direction, and so on. The
TLinks between the little bubbles are drawn in the figure as well. One example is in Figure 1 in
Appendix A.2.
3.4. Results
In the set of the 119 annotated news, 3068 events were identified, occurring both at the story
and report levels. There is only one news article devoid of the report level, which is evidence
of the prominence of this part in the overall structure of the type of news that composes the
dataset. The news article in Appendix A.1 illustrates the pattern followed by the generality
of the news that were annotated. In this particular example, as one can observe in the Bubble
diagram of Appendix A.2.1, the news includes five blocks of report marked by five reporting
verbs. Hence, it is expected to encounter a reasonably high number of reporting events that
build the report level. We found, per news, an average of 3.50 reporting events in a total of 417
events.
At the story level, the number of events is necessarily higher, with a total of 2651 events,
an average of 22.27 events per news, because the purpose of this type of journalistic text is
to inform the reader about a situation. So, returning to our running example in A.1, events
such as feriu, fazer disparos, recusou, parar (’wounded, shooting, refused, stop’) are part of the
story level. In this example, we can observe that not all the events that compose the story are
introduced by a reporting event. For instance, the information given in the third paragraph
about the event atingido (’hit’) is one of those cases.
Table 1 systematises the analysis of the temporal relations across the two levels under scrutiny:
the story and report levels.
Overall, the analysis of the temporal relations indicates that, for most cases, the temporal
relation Identity, picked whenever two events are the same, is the most frequent, which is
expected because, as explained in Section 2, prototypically, the journalist presents the main
event in the news’ lead and then other events related to the main one are presented in more
detail throughout the news. In our running example (A.1), this is true for the event fazer disparos
(�shooting�), which is introduced in the first report block and then resumed in the second
report block (cf. Appendix A.2.1). The recurrence of the TLink Identity is also related to some
annotation’s rules, such as the one about light verb constructions like fizeram disparos, which
stipulates that the noun (disparos) and the light verb (fizeram), must be linked by an Identity
TLink. This is recommended in the ISO standard to capture the fact that, despite being two
distinct words that have the potential to denote two different events, they represent a single
83
Table 1
Temporal relations
Between events...
Relation
(A)....of the story (B)....embedded in the reports (C)....across the reports
Identity 587 255 16
After 546 173 15
Before 420 151 7
Simultaneous 413 176 3
Includes 278 125 6
Is_included 232 109 5
During 39 23 0
Begun_by 4 3 0
Begins 3 1 0
Ends 2 0 0
Ended_by 2 1 0
event in the constructions in question.
Regarding the events that form the story, the second most common temporal relation is After,
as shown in column A of Table 1. In this case, the events are presented chronologically, that is,
they are described in the order they happened. This applies to most of the events depicted in
paragraph six of our running example and represented in the MSC A.2.2. Our findings align
with the evidence presented by [15]’s study and described in Section 2. The same observation
about the frequency of Tlinks is true concerning the story events described in different report
blocks (cf. column C of Table 1). Although the differences towards other temporal relations
are not so notable, the first event of report B is typically linked to an event of report A by
TLink After, which means that there is temporal sucessivity. The relation between the event
desconhecer (’didn’t know’) and intimidação (’intimidating’) illustrates this feature (cf. A.2.1).
The TLink Before often ensues in our dataset, which is in accordance with the general
structure of news, being the third most recurrent TLink within the events that compose the
story and the events across reports. In fact, news usually, after making known the main event,
give an account of earlier events. In our running example in Appendix A.1, the events disparos
(�firing’) and cessar (�stop�) represented in the second bubble of the picture in Appendix A.2.1
showcase this temporal relation.
The analysis of the temporal relations between the events described by each report block
(cf. column B Table 1) reveals different results. Even though the chronological order is also
frequently adopted, as one can observe in the Bubble representation in A.2.1, it comes after the
simultaneity relation. The simultaneity and inclusion relations are also common (cf. Table 1),
which are compatible with the fact that, again and again, the journalist provides more detail
about the main event, describing either secondary events that happen at the same time or
subevents that are part of an event. Once again, this is true for instances of our running example,
such as the inclusion relation between shooting and refusing to stop, two secondary events
reported in the first report block (cf. first Bubble in Appendix A.2.1).
Overall, the results disclose a wide and rich variety of temporal relations between all the
story’s events. The events embedded in the reports follow the general trend of being linked by
84
Identity, but they often describe situations that happen simultaneously more frequently than
when considering only the story’s events.
Table 2
Event Type and TLinks
Type of Events After Before Simultaneous Includes Is_Included
Count 422 284 132 20 44
’Transition’, ’Transition’
Percentage 61.26% 50.27% 36.16% 6.51% 16.3%
Count 104 79 39 9 88
’Transition’, ’State’
Percentage 15.09% 13.98% 10.68% 2.93% 32.59%
Count 59 29 21 3 30
’Transition’, ’Process’
Percentage 8.56% 5.13% 5.75% 0.98% 11.11%
Count 44 66 73 180 16
’State’, ’Transition’
Percentage 6.39% 11.77% 20% 58.55% 5.93%
Count 43 40 27 21 8
’Process’, ’Transition’
Percentage 6.24% 7.08% 7.40% 6.84% 2.96%
Analysing the association between some of the events’ grammatical features, particularly
Class, Type, Tense, PoS, and temporal relations, leads to relevant conclusions. The first con-
clusion drawn from the results is that PoS and Tense are not pertinent factors when inferring
temporal relations. In terms of the association of Class to TLinks, since one can group the
different labels into eventive and stative situations, and because the tag Type is more specific
concerning aspectual differences, we deemed it best to analyse the latter. As a matter of fact,
regarding the label Type, there seems to be a correlation between the values attributed to events
and the temporal relation. This correlation concerns the aspectual properties of telicity and
durativity. Telicity is the property of situations that cannot last indefinitely, as they have an in-
trinsic terminal point in their internal temporal structure. This aspectual property distinguishes
Transitions, i.e., dynamic events, corresponding to [39]’s Accomplishments (e.g.’read a book’,
’paint a house’) and Achievements (e.g. ’win a race’, ’reach the summit’), from Processes (also,
dynamic events, but atelic, corresponding to Vendler’s Activities, like ’sleep’, ’swim’) and States
(non-dynamic events, like ’be tall’, ’like’, ’live’).
In most cases, temporal successive readings (which subsumes After and Before TLinks)
involve two situations (cf. Table 2): Transitions, or at least one of them is a Transition. For
example, regarding TLink_Before, there were 284 cases (50,27%) involving only Transitions,
while with TLink_After there were 422 cases (61,26%). On the contrary, TLink_Before connects
Transitions and States in only 25,75% of the cases and Transitions and Processes in 12,21% of
the cases, whereas TLink_After connects Transitions and States in 21,48% of the cases and
Transitions and Processes in 14,8% of the cases.
The number of cases in which both situations are non-telic, i.e., Processes or States is con-
siderably smaller (cf. Table 2). For instance, the cases involving only States were 22 (3,89%)
(TLink_before) and 15 (2,18%) (TLink_After), while those involving only Processes were 10
(1,77%) (TLink_Before) and 8 (1,16%) (TLink_After).
In summary, although it is not a categorical distinction, there is a clear correlation between
telicity and temporal successivity.
Simultaneity readings seem to be correlated with another aspectual property – duration.
85
States and Processes are durative; Transitions may or may not be durative (as this tag subsumes
two aspectual types: Accomplishments, which are durative, and Achievements, which are
non-durative (cf. [39])). Simultaneity readings are related to three TLinks: TLink_Is_Included,
TLink_Includes and TLink_Simultaneous. The first two links show a clear preference for in-
cluding Transitions in the time interval of States, as presented in Table 2 (58.55% with the
TLink_Includes and 32.59% with the TLink_is_Included). On the contrary, the numbers in-
volving only Transitions (i.e., where a Transition is located in the time interval of another
Transition) are considerably lower (cf. Table 2: 6.51% with the TLink_Includes and 16.3%
with the TLink_Is_Included). However, TLink_Simultaneous presents different results from
the previous ones, as the most significant number of cases involves two Transitions (36,16%)
(cf. Table 2). The examples in which Transitions are simultaneous with States correspond to
30.68% cases, while those in which Transitions are simultaneous with Processes correspond
to only 13.15% (cf. Table 2). The relatively high number of cases of two Transitions related
by TLink_Simultaneous is likely related to the fact that, in the news, different situations are
mentioned that are concomitant with the central situation of the news, which is described in
the lead.
To sum up, although there are no absolute restrictions, it appears that simultaneity readings
are related to durative situations, particularly when the TLinks Is_included and Includes are
involved. 4
4. Conclusions
The purpose of the study presented in this paper was to investigate the temporal structure of
news articles at two levels - the story and report levels. Three research questions were formulated.
With respect to the first research question - How are the events temporally organised at the
story level? -, our analysis revealed that, in the majority of cases, the journalist repeats the same
event introduced in the lead throughout the news, resulting in the most frequent TLink Identity.
However, the additional information provided by the journalist about the main event tends to
follow a chronological order, which is why TLink After is also frequently observed. Regarding
RQ2, which aims to investigate the role of the report level in the temporal organisation of
news stories, we can conclude that reporting events hold significant importance in this type
of news articles. Though not all events in the story are introduced by reporting events, when
they are, the most frequent relation within each block of the reporting event is simultaneity.
Therefore, there is a somewhat distinct pattern of temporal organisation depending on whether
we consider each reporting event or the entire news article. In relation to our third research
question (RQ3), which is concerned with the relationship between the temporal relations and
the grammatical characteristics of events, our study indicates that Type is the most significant
attribute. Additionally, we have observed a correlation between telic situations and temporal
successiveness, and durative situations and temporal simultaneity.
In summary, our linguistic analysis provided valuable insights into how news articles convey
4
This same exact correlation between temporal readings and aspectual properties, in particular the fact that telicity
triggers temporal succession, whereas durativity triggers simultaneity, was also pointed out for independent reasons
in other works (cf. [40] for the analysis of adverbial perfect participle clauses in Portuguese and English)
86
events. By studying linguistic features, especially those that determine temporal relations, we
can use that information to improve models to retrieve or predict temporal information. The
visualisations generated by the pipeline we developed were crucial in automatically generating
temporal relations, verifying manual annotations, and analysing temporal structures. Our
study has provided a comprehensive understanding of the narrative temporal structure of news
articles, contributing to the field of narrative analysis.
Acknowledgments
National Funds finance this work through the FCT - Fundação para a Ciência e a Tecnologia,
I.P. (Portuguese Foundation for Science and Technology) within the project StorySense, with
reference 2022.09312.PTDC. It was also funded by national funds through FCT – Fundação para
a Ciência e a Tecnologia, I.P., within the project UIDB/00022/2020.
References
[1] A. Leeuwenberg, M.-F. Moens, Temporal information extraction by predicting relative
time-lines, in: Proceedings of the EMNLP’18, ACL, Brussels, Belgium, 2018, pp. 1237–1246.
[2] B. Santana, R. Campos, E. Amorim, A. Jorge, P. Silvano, S. Nunes, A survey on narrative
extraction from textual data, Artif Intell Rev (2023).
[3] I. Zahid, H. Zhang, F. Boons, R. Batista-Navarro, Towards the automatic analysis of
the structure of news stories, CEUR Workshop Proceedings 2342 (2019) 71–79. 2nd
International Workshop on Narrative Extraction From Texts, Text2Story 2019 ; Conference
date: 14-04-2019.
[4] B. Allan, The language of news media, Blackwell, 1991.
[5] J. Chovanec, Pragmatics of Tense and Time in News, Pragmatics & Beyond New Series,
John Benjamins Publishing Company, Amsterdam, 2014.
[6] P. Silvano, E. Amorim, A. Leal, I. Cantante, F. Silva, A. Jorge, R. Campos, S. Nunes, Annota-
tion and Visualisation of Reporting Events in Textual Narratives, in: R. Campos, A. Jorge,
A. Jatowt, S. Bhatia, M. Litvak (Eds.), CEUR Workshop Proceedings, volume 3370 of CEUR
Workshop Proceedings, 2023, pp. 47–62.
[7] J. C. Harry, Journalistic quotation: Reported speech in newspapers from a semiotic-
linguistic perspective, Journalism 15 (2014) 1041–1058.
[8] S. Pareti, Towards a discourse resource for Italian: developing an annotation schema for
attribution, Ph.D. thesis, Master’s thesis, Faculty of Letters and Philosophy, University of
Pavia, Italy, 2009.
[9] G. Palshikar, S. Pawar, S. Patil, S. Hingmire, N. Ramrakhiyani, H. Bedi, P. Bhattacharyya,
V. Varma, Extraction of Message Sequence Charts from Narrative History Text, in:
Proceedings of the First Workshop on Narrative Understanding, 2019, pp. 28–36.
[10] S. Hingmire, N. Ramrakhiyani, A. K. Singh, S. Patil, G. Palshikar, P. Bhattacharyya, V. Varma,
Extracting Message Sequence Charts from Hindi Narrative Text, in: Proceedings of the
First Joint Workshop on Narrative Understanding, Storylines, and Events, 2020, pp. 87–96.
87
[11] S. Liu, Y. Wu, E. Wei, M. Liu, Y. Liu, Storyflow: Tracking the evolution of stories, IEEE
Transactions on Visualization and Computer Graphics 19 (2013) 2436–2445.
[12] Q. Chen, S. Cao, J. Wang, N. Cao, How does automation shape the process of narrative
visualization: A survey of tools, IEEE Transactions on Visualization and Computer
Graphics (2023).
[13] ISO-24617-1, Language resource management - Semantic annotation framework (SemAF) -
Part 1: Time and events (SemAF-Time, ISO-TimeML), Standard, Geneva, CH, 2012.
[14] A. Bell, News time, Time & Society 4 (1995) 305–328. doi:10.1177/
0961463X95004003003.
[15] C. Schokkenbroek, News stories: Structure, time and evaluation, Time & Society 8 (1999)
59–98. doi:10.1177/0961463X99008001004.
[16] J. Sanders, K. van Krieken, Traveling through narrative time: How tense and temporal
deixis guide the representation of time and viewpoint in news narratives, Cognitive
Linguistics 30 (2019) 281–304. doi:10.1515/cog-2018-0041.
[17] I. Mani, B. Schiffman, Temporally anchoring and ordering events in news, 2004.
[18] A. Setzer, Temporal Information in Newswire Articles: an Annotation Scheme and Corpus
Study, Ph.D. thesis, University of Sheffield, 2001.
[19] E. Filatova, E. Hovy, Assigning time-stamps to event-clauses, in: Proceedings of the Work-
shop on Temporal and Spatial Information Processing - Volume 13, TASIP ’01, Association
for Computational Linguistics, USA, 2001, pp. 1–8. doi:10.3115/1118238.1118250.
[20] J. Pustejovsky, R. Ingria, R. Saurí, J. M. Castaño, J. Littman, R. J. Gaizauskas, A. Setzer,
G. Katz, I. Mani, The specification language timeML, in: I. Mani, J. Pustejovsky, R. J.
Gaizauskas (Eds.), The Language of Time - A Reader, Oxford University Press, 2005, pp.
545–558.
[21] J. Pustejovsky, K. Lee, H. Bunt, L. Romary, ISO-TimeML: An international standard for
semantic annotation, in: Proceedings of the Seventh International Conference on Language
Resources and Evaluation (LREC’10), European Language Resources Association (ELRA),
Valletta, Malta, 2010. URL: http://www.lrec-conf.org/proceedings/lrec2010/pdf/55_Paper.
pdf.
[22] F. Costa, Processing Temporal Information in Unstructured Documents, Ph.D. thesis,
Faculdade de Ciências da Universidade de Lisboa, 2012.
[23] R. Campos, Disambiguating Implicit Temporal Queries for Temporal Information Retrieval
Applications, Ph.D. thesis, Faculdade de Ciências da Universidade do Porto, 2013.
[24] R. Campos, G. Dias, A. M. Jorge, C. Nunes, Identifying top relevant dates for implicit
time sensitive queries, Information Retrieval Journal 20 (2017) 363 – 398. URL: https:
//api.semanticscholar.org/CorpusID:254583715.
[25] Q. Ning, H. Wu, D. Roth, A multi-axis annotation scheme for event temporal rela-
tions, in: Proceedings of the 56th Annual Meeting of the Association for Compu-
tational Linguistics (Volume 1: Long Papers), Association for Computational Linguis-
tics, Melbourne, Australia, 2018, pp. 1318–1328. URL: https://aclanthology.org/P18-1122.
doi:10.18653/v1/P18-1122.
[26] H. Sousa, R. Campos, A. Jorge, Tei2go: A multilingual approach for fast temporal expression
identification, in: Proceedings of the 32nd ACM International Conference on Information
and Knowledge Management, CIKM ’23, Association for Computing Machinery, New York,
88
NY, USA, 2023, p. 5401–5406. doi:10.1145/3583780.3615130.
[27] H. Sousa, R. Campos, A. M. Jorge, Tieval: An evaluation framework for temporal in-
formation extraction systems, in: Proceedings of the 46th International ACM SIGIR
Conference on Research and Development in Information Retrieval, SIGIR ’23, Associa-
tion for Computing Machinery, New York, NY, USA, 2023, p. 2871–2879. doi:10.1145/
3539618.3591892.
[28] R. Campos, G. Dias, A. M. Jorge, A. Jatowt, Survey of temporal information retrieval and
related applications, ACM Comput. Surv. 47 (2014). doi:10.1145/2619088.
[29] A. Rogers, G. Smelkov, A. Rumshisky, Narrative time: Dense high-speed temporal
annotation on a timeline, CoRR abs/1908.11443 (2019). URL: arxiv.org/abs/1908.11443.
arXiv:1908.11443.
[30] L. Derczynski, Automatically Ordering Events and Times in Text, 2017.
[31] ISO-24617-4, Language resource management- Semantic annotation framework (SemAF) -
Part 4: Semantic roles (SemAF-SR), Standard, Geneva, CH, 2014.
[32] ISO-24617-7, Language resource management-Semantic annotation framework (SemAF) -
Part 7: Spatial information, Standard, Geneva, CH, 2020.
[33] ISO-24617-9, Language resource management- Semantic annotation framework (SemAF) -
- Part 9: Reference annotation framework (RAF), Standard, Geneva, CH, 2019.
[34] P. Silvano, A. Leal, F. Silva, I. Cantante, F. Oliveira, A. Mario Jorge, Developing a multilayer
semantic annotation scheme based on ISO standards for the visualization of a newswire
corpus, in: Proceedings of the 17th Joint ACL - ISO Workshop on Interoperable Semantic
Annotation, Association for Computational Linguistics, Groningen, The Netherlands
(online), 2021, pp. 1–13. URL: https://aclanthology.org/2021.isa-1.1.
[35] A. Leal, P. Silvano, E. Amorim, I. Cantante, F. Silva, A. Mario Jorge, R. Campos, The
place of ISO-space in Text2Story multilayer annotation scheme, in: Proceedings of the
18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022,
European Language Resources Association, Marseille, France, 2022, pp. 61–70. URL: https:
//aclanthology.org/2022.isa-1.8.
[36] L. Deleger, Q. Li, T. Lingren, M. Kaiser, K. Molnar, L. Stoutenborough, M. Kouril, K. Marsolo,
I. Solti, Building gold standard corpora for medical natural language processing tasks,
AMIA ... Annual Symposium proceedings. AMIA Symposium 2012 (2012) 144—153. URL:
https://europepmc.org/articles/PMC3540456.
[37] A. Brandsen, S. Verberne, M. Wansleeben, K. Lambers, Creating a dataset for named
entity recognition in the archaeology domain, in: N. Calzolari, F. Béchet, P. Blache,
K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo,
A. Moreno, J. Odijk, S. Piperidis (Eds.), Proceedings of the Twelfth Language Resources
and Evaluation Conference, European Language Resources Association, Marseille, France,
2020, pp. 4573–4577. URL: https://aclanthology.org/2020.lrec-1.562.
[38] L. J. V. Miranda, Developing a named entity recognition dataset for tagalog, 2023.
arXiv:2311.07161.
[39] Z. Vendler, Verbs and times, The philosophical review 66 (1957) 143–160.
[40] P. Silvano, A. Leal, J. Cordeiro, On adverbial perfect participial clauses in Portuguese
varieties and British english, Romance Languages and Linguistic Theory 2018: Selected
papers from’Going Romance’32, Utrecht 357 (2021) 263–286.
89
A. An Example from our Dataset
A.1. The news article
A GNR feriu ligeiramente com “bagos de borracha” (balas de borracha) um homem em Cabeceiras
de Basto que estava a fazer disparos para o ar com munições reais e que se recusou a parar,
disse hoje fonte policial.
O caso deu-se na freguesia de Refojos de Basto, concelho de Cabeceiras de Basto, distrito de
Braga, um pouco antes das 00:00.
O homem, de 52 anos, foi atingido pelas balas de borracha nas pernas.
Fonte do Comando Territorial da GNR em Braga disse à agência Lusa que os operacionais só
atingiram o homem depois de este desobedecer à ordem de cessar os disparos.
“Não parou sequer quando os militares fizeram disparos de intimidação para o ar”, acrescen-
tou a fonte, que afirmou desconhecer as causas do comportamento do detido.
Depois de tratado a ferimentos ligeiros em unidade hospitalar, foi detido no posto da GNR
em Cabeceiras e será hoje presente a juiz de instrução para fixação das medidas de coação tidas
por convenientes.
Como o caso envolveu armas de fogo, a Polícia Judiciária vai ser informada de detalhes,
indicou a fonte.
The GNR slightly wounded a man in Cabeceiras de Basto with "rubber bullets" who was
shooting into the air with live ammunition and refused to stop, a police source said today.
The incident took place in the parish of Refojos de Basto, in the municipality of Cabeceiras
de Basto, Braga district, just before 00:00.
The 52-year-old man was hit by rubber bullets in the legs.
A source from the GNR’s Territorial Command in Braga told Lusa news agency that the
officers only hit the man after he disobeyed the order to stop firing.
"He didn’t even stop when the soldiers fired intimidating shots into the air," added the source,
who said he didn’t know the cause of the detainee’s behaviour.
After being treated for minor injuries at a hospital, he was detained at the GNR post in
Cabeceiras and will be brought before an investigating judge today for the coercive measures
deemed appropriate.
As the case involved firearms, the Judicial Police will be informed of the details, the source
said.
90
A.2. The temporal structure representations
A.2.1. Bubble representation
Figure 1: The bubbles follow a chronological order. The first big bubble, representing a reporting event,
is positioned at noon, and the subsequent big bubbles, following the hourly pattern, represent reporting
events that occur later. This allows us to discern the sequence of reporting events in the text based on
the order of the big bubbles. Each reporting event also contains events within it. These events are the
ones that have been declared or reported by someone and are also arranged chronologically, similar to
the big bubbles. Finally, the temporal relationships between these events are depicted through arrows
connecting the bubbles or rectangles. In this example, the first reporting event is disse (’said’) and the
number “4.0” indicates that this event is in the fourth sentence of the document. Also, attached to
this event, there are the following events: feriu (’wounded’), fazer (’did’), disparos (’shots’), recusou
(’refused’), parar (’stop’). The dashed arrows connect events inside the same Big Bubble and the filled
arrows connect events of different Big Bubbles.
91
A.2.2. MSC representation
Figure 2: This Message Sequence Chart (MSC) illustrates the sequence of events within the story layer.
Each colour corresponds to a specific class of events: orange for the Occurrence class, green for the
I_Action class, light purple for the State class, and blue for reporting events. The events are arranged
from left to right to reflect their sequential order in the narrative. Additionally, the diagram depicts the
temporal relations between these events.
92