Untangling a Web of Temporal Relations in News Articles Purificação Silvano1,⇤ , Evelin Amorim2 , António Leal1 , Inês Cantante1 , Alípio Jorge2,3 , Ricardo Campos2,4 and Nana Yu1 1 University of Porto/ CLUP, Porto, Portugal 2 INESC TEC, Porto, Portugal 3 University of Porto, Porto, Portugal 4 University of Beira Interior, Covilhã, Portugal Abstract Temporal reasoning has been the focus of several studies during the past years, both in linguistics and computational studies. Although advances on this topic are undeniable, there are still improvements to be made and new avenues to pursue. One relevant problem concerns the temporal ordering of the events, particularly asserting and representing how events are temporally related and how the story told in the narrative evolves. This paper aims to analyse the temporal structure of narratives present in news articles with the aid of different visualisations. To this end, we annotated a dataset of 119 news articles in European Portuguese following an annotation scheme that combines different parts of ISO 24617-Language Resource Management - Semantic Annotation Framework (SemAF). The temporal layer of this annotation scheme identifies the events and their main features, as well as the temporal links between the events. The annotation provided us with paramount information about the temporal characteristics of news at two levels: the story and the report levels. The visualisations that we propose facilitate the process of understanding how news are temporally organised, providing a more practical means to observe them. Keywords temporal structure, narrative news, annotation, visualisations 1. Introduction Understanding the temporal dimension of a text requires more than merely identifying events and temporal expressions. Eventually, this understanding of text requires building structured information about events and inferring their temporal relations before constructing a timeline of events. Such tasks are still very challenging in Natural Language Processing (NLP) and In: R. Campos, A. Jorge, A. Jatowt, S. Bhatia, M. Litvak (eds.): Proceedings of the Text2Story’24 Workshop, Glasgow (Scotland), 24-March-2024 ⇤ Corresponding author. � msilvano@letras.up.pt (P. Silvano); evelin.f.amorim@inesctec.pt (E. Amorim); jleal@letras.up.pt (A. Leal); cantante.ines@gmail.com (I. Cantante); amjorge@fc.up.pt (A. Jorge); ricardo.campos@inesctec.pt (R. Campos); robertananayu@hotmail.com (N. Yu) � https://www.purisilvano.pt (P. Silvano); https://github.com/evelinamorim (E. Amorim); http://www.ccc.ipt.pt/~ricardo/ (R. Campos) � 0000-0001-8057-5338 (P. Silvano); 3C18-B766-4A42 (E. Amorim); 0000-0002-6198-2496 (A. Leal); 0009-0002-3866-4550 (I. Cantante); 0000-0002-5475-1382 (A. Jorge); 0000-0002-8767-8126 (R. Campos); 0000-0003-4378-088X (N. Yu) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 77 Information Retrieval (IR), and they can surely benefit from the input of linguistic analysis. Currently, temporal extraction comprises three phases: (i) recognition of events and temporal expressions; (ii) recognition of temporal relations between them; and (iii) time-line constructions based on the temporal relations [1]. The first phase has been widely researched and carried out with a high degree of success (cf.[2] for the state-of-the-art about temporal-related identification and extraction tasks). However, the last two phases are more complex, and although some research has produced encouraging results (cf.[2] for an overview), many problems persist, requiring further work. Determining the temporal organisation of events becomes even more challenging when dealing with texts that present events in a non-chronological order, like news articles, which display an intricate temporal structure compared to other types of narratives [3]. It is widely accepted that news stories are narratives, and as such, they can be analysed within the conceptual framework of narratology [4]. However, contrary to other narratives, not only is there a different linear arrangement of the narrative components, but also a reappearance or ’recycling’ of the most critical aspects of the story throughout the news text [5], which has implications in the news temporal structure. Moreover, as demonstrated by [6] in news articles, one frequently finds two narratives: the narrative of reporting the story targeted by the news and the story itself composed of the reported events. Reported speech is a common technique used in news writing [7]. Journalists use direct quotations to reproduce what others have said and indirect quotations to describe what others think or say. This method is crucial when writing news, as reporters usually report on what their sources of information have to say about what they have seen or know. Reported speech and quotation are often treated in many studies as attribution relations [8], that is, connections between pieces of information and the sources that express them. Thus, news can have two levels of narrative: one that explains the events that make up the story being narrated (what happened, where, when, and why), which is the sequence of events that is the news topic, and another that describes the sources that provided the information to the journalist who wrote the news (who said what to the journalist). These two levels appear in textual sequences that alternate throughout the news text. Separating the two narratives is crucial in determining the chronological order of the story’s events. Regarding time-line constructed based on temporal relations, researchers have explored different formats. For instance, [9] and [10] used Message Sequence Charts (MSC) to represent events and their relations in a temporal order. [6] used a visual representation that employs an analog clock to identify reporting events, the sources and their nested events. Regarding visualisations devoted to the general public, most of proposals focus on infographics, and timelines [11, 12]. Regardless of the format, visualisations are quite useful, because, besides determining the temporal order of events automatically, the manual inspection of the temporal arrangement of events can be done to get insights quickly or perform deep analysis about some structure. However, for most cases, such visualisations lack more specific information about the events, which can be provided by manual annotation. This kind of information is essential to make a deeper analysis of the narrative temporal structure. For this study, we defined three objectives. First, we aim to analyse the intricate relations between the events throughout the narrative of the news annotated according to ISO-24617-1 [13]. Second, we work towards assessing how the story events are organised within each report 78 and across the reports. Finally, we seek to determine associations between the temporal relations and some of the grammatical features of the events. The representation of the temporal relations in two formats, the Bubble data structure, proposed in [6] and the Message Sequence Chart [9], enables a swift and time-saving analysis of temporal features of news articles. The paper is organised as follows. Section 2 reviews some proposals about the temporal structure of news. Section 3 describes the study, beginning with the description of the problem and the research questions (3.1), followed by relevant information about the dataset and the annotation scheme (3.2) and the study’s methodology (3.3). Subsection 3.4 reports on the results of our study. Finally, Section 4, presents the overall conclusions. 2. Temporal structure of news articles From a solely linguistic approach, several research works have yielded significant findings regarding news temporal structure [4, 5, 14, 15, 16]. News articles are described as starting with the main event in the lead and going back to earlier events, and presenting details in instalments [14] in the body of the news, in a cyclic or “zigzagging pattern, with the time-line repeatedly moving into the past and the future concerning the main event” [5]. For this reason, several analyses describe news stories as non-chronological “at odds with the linear narrative point” [4]. However, [15] argues that Bell’s analysis of news as non-chronological may be too hasty because a closer examination of news temporal structure from a linguistic perspective unravels the matching between the discourse structure and the underlying event structure. The author presents evidence that events are frequently told in the order they occur. Whenever they are not, the underlying order of the events can still be interpreted due to some linguistic devices, contrary to what would happen if a story were truly non-chronological. In fact, a controlled experiment with three subjects conducted by [17] disclosed that humans were quite capable of untangling the order of the events. From a computational point of view, some studies propose annotation schemes, methods and techniques to identify and retrieve temporal information from news [18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]; cf. also [29] for an overview of current temporal annotation schemes and corpora). Naturally, regardless of all the progress, many problems persist (cf. [30] for an overview), mainly because determining the chronology of events depends on multiple factors, such as linear order of discourse, tense, temporal expressions, sentence meaning, and aspect. As discussed by [17], although a computer easily computes the first three, the last two are more difficult to process. Moreover, the complexity of temporal information retrieval of news is amplified by the presence of reports with the mention of sources or attributions, which create a second narrative layer, the narrative of the report. Each report comprises several events from the story [6] temporally linked within and across the different reports. Identifying and extracting temporal relations among the events that compose the story can be a troublesome endeavour, let alone establishing all the temporal links between the story events and those that report the storyline. To the best of our knowledge, none of the existing studies takes on the annotation, analysis, and representation of the temporal relations between the two layers of telling a story in news articles: the narrative of the report and the narrative of the story. Nonetheless, such a study can produce relevant input for the task of temporal information retrieval. 79 3. The study 3.1. Problem and research questions In this study, our objective is to characterise the temporal structure of narrative news. However, since news comprise two intertwined levels, the level of the story being told and the level of the report that tells that story [6] it is our aim not only to determine the temporal structure of the story but also to establish the role of the different report blocks in its temporal organisation. Furthermore, with this study, we seek to ascertain which main features of events are associated with the different temporal relations, that is, to figure out if one can relate a specific temporal relation between two events to certain grammatical characteristics of those events. To this end, we formulated three research questions: RQ1: How are the events temporally organised at the story level? RQ2: What is the report level’s role in the temporal organisation of the story? RQ3: What is the association between the temporal relations and some of the events’ gram- matical features? The three research questions were created to help define the typical structure of news, and, ultimately, assist in the development of automatic methods for information extraction. The first question aims to define the prototypical temporal structure of news and provide important input for automatic forms of temporal analysis. Extracting temporal information from unstructured data can greatly benefit journalists and other stakeholders. However, it is a challenging task. Creating timelines from text is relatively easy if temporal expressions unambiguously locate situations. But it becomes more complicated if the text lacks temporal expressions. Therefore, understanding how situations are organised throughout the news can help with automatic extraction. The second question is related to the first one because to extract information and organise it temporally, we need to distinguish between situations that are part of the story being narrated and those that provide information about the sources used by the journalist to write the news. Both types of situations occur alternately in the news, making it difficult to automatically classify them without some linguistic input. The third question aims to deepen our understanding of news events characteristics, which will impact both theoretical linguistics and computer science. Determining the linguistic features of events is essential for automatic methods of extracting and organising information, particularly in cases where there are no explicit temporal adverbials or discourse markers. For example, in the sequence of sentences "The boys played football. The girls swam in the pool.", the preferred temporal connection between the situations is simultaneity, whereas in "The boys broke the window glass. The girls tore the curtain.", the preferred temporal relationship is successivity. Understanding which grammatical elements contribute to these temporal relations is key to temporal information extraction. 80 3.2. Dataset and annotation For this study, we utilized the T2S Lusa Annotated Dataset 1 , which is a set of 119 news texts retrieved from the T2S Lusa dataset 2 . The T2S Lusa dataset contains 360 news articles collected from the general news feed of Lusa, a Portuguese news agency. First, the team of linguists conducted a preliminary manual analysis of a small set of Lusa news, and identified a set of keywords that were more frequent in the news with a narrative nature, such as "assaults", "robberies", "accidents", or "police interventions". Afterwards, Lusa collected the news containing these keywords with a length restriction of 50 to 200 words. Finally, the linguists manually checked this collection, resulting in a corpus of 360 news articles. The data was annotated following a previously designed multilayered annotation scheme, which was built up in such a way that it combines four parts of ISO-24617 Language resource management: Part 1 – Time and Events [13]; Part 4 – Semantic Roles [31]; Part 7 – Spatial Information [32]; Part 9 – Reference Annotation Framework [33] (cf.[34] and [35]). It comprises two types of structures: entity structures (events, times, participants, measures, and spatial relations) and link structures (temporal, aspectual, spatial, subordination, objectal, and semantic role links). The focus of the current study is on tags that describe events and temporal links. Events are annotated to identify eventualities (i.e. events and states) and their specific characteristics using the Event tag, which specifies values for Class, Type, Part of Speech, and Tense. Class identifies some characteristics of the lexical semantics of verbs. For example, the Reporting value is given to events in the case of verbs that denote a situation in which an entity reports a story or provides information about a particular situation. Type is related to the Aspect of the situation, which determines its internal temporal structure, and can be a State (a situation in which something obtains or holds), a Process (a durative and atelic situation), or a Transition (a situation introducing a consequent state). Temporal links (TLinks) are used to identify temporal relations between events, which are critical in narrative texts for determining the chronological order of the events. TLinks have the following values: Before, After, Includes, Is_Included, During, Simultaneous, Identity, Begins, Ends, Begun_By, and Ended_By. The dataset of 119 news articles was annotated by a PhD student in Linguistics who collabo- rated in the development of the annotation scheme. The annotator discussed problematic cases with a team of linguists before carrying out the annotations. To ensure the accuracy of the annotations, a second annotator followed the guidelines of the annotation manual and annotated a sample of 10% of the dataset (19 news articles) to test inter-annotator agreement. Regarding the inter-annotator agreement, first, we measured the agreement of the events labelling. For this task, the agreement between the two annotators was computed as a pairwise f1. The choice for the f1 instead of Cohen’s Kappa in the event labelling is due to the high number of non-labelled tokens (disregarded for this particular study), which can raise the kappa score disproportionately [36, 37, 38]. The results reveal that the pair-wise f1-score was 0.77, which is substantial agreement. With respect to the attributes, we computed the classical Cohen’s Kappa score for the attributes Class (0.63), Type (0.51), Part of Speech (0.81), and Tense (0.74), the attributes relevant to the present study, where the agreement is also substantial, except 1 https://rdm.inesctec.pt/en/dataset/cs-2023-018 2 https://rdm.inesctec.pt/dataset/cs-2023-015 81 the agreement for Type - probably the most challenging -, which is moderate. Concerning the temporal links between the events, Cohen’s Kappa was 0.31, and the agreement of the attributes of the temporal links resulted in a Cohen’s Kappa value of 0.32, both considered fair. The difficulty of the task can explain these lower numbers. In future work, we will perform more annotation experiments to explain the reasons for this discrepancy. 3.3. Methodology To answer our first research question (How are the events temporally organised at the story level?), we focused on the temporal relations (TLinks) between all the events that comprise the story, excluding the reporting events belonging to the report level, which the TLink Identity connected. This strategy excluded the reporting events that are part of the second level of discourse. As explained in [6], the reporting events that compose this second level of discourse are classified as belonging to the class Reporting and are linked by the TLink Identity because, in a piece of given news, they integrate the same report carried out by different reporting events. Some examples are verbs such as informou (�informed�), declarou (�declared’), but also segundo (’according to’) + noun phrase or de acordo com (’according to’) + noun phrase, also markers of reporting events. Accordingly, when analysing the temporal relations between the story events, in an example like the one in A.1, we excluded the events classified as reporting (signalled in bold red) and linked among them by TLink Identity. For our second research question (What is the report level’s role in the story’s temporal or- ganisation?), we aimed to determine the reports’ role in organising the story’s timeline. We collected two types of information: (i) TLinks between all the events within each report; and (ii) the TLinks between the first story event of a report block B and any story event from the report block A. The example shown in Appendix A.1 illustrates this procedure. There are five blocks, represented in the Bubble structure A.2.1: the first and the second reports are introduced by the reporting verb disse (’said’), the third by the reporting event acrescentou (’added’), the fourth by afirmou (’stated’) and the last one by indicou (�pointed�). These five blocks of the report form the report level. Each block, represented by a Bubble, includes multiple story events. So, first, we extracted all the TLinks connecting these events. Second, we extracted all the TLinks between the story events embedded in the different Bubbles. For our third research question (What is the association between the temporal relations and some of the events’ grammatical features?), we thoroughly analysed the features of the events that form the story level. We aimed to uncover the connection between the grammatical characteristics of the events and each temporal relation. To achieve this, we collected all the relevant information from the tags Class, Type, Part of Speech (PoS), and Tense of the events linked by each of the TLinks likely to impact temporal relations. The extraction and visualisation of these data were made possible by the package text2story3 . This is a Python package devoted to automatically extracting narratives programmatically and easily. Additionally, the package offers three types of visualisation for narratives: Knowledge Graph (KG), Message Sequence Chart (MSC), and Bubble Diagram (BM). In this study, we only employ the MSC and BM to visualise some news story elements. The MSC represents 3 https://pypi.org/project/text2story/ 82 the events that form the story level as lifelines, i.e., the coloured rectangles in sequence with lines underneath. These events can be linked by TLinks that were annotated in the text. The sequence order of events in the MSC is the same order that they appear in the text. This arrangement allows more careful analysis of such types of events than if this information was in the annotation tool. One example can be seen in Figure 2 in Appendix A.2. Differently, the Bubble Diagram’s main goal is to represent the reporting events and each event related to the story attached to them. In this diagram, there is the Big Bubble, which represents a reporting event. Inside each Big Bubble are little bubbles, representing events at the story’s level. The bubbles, the big ones, and the little ones are sorted clockwise, i.e., the first event that appears in the text goes in a noon position, the second appears in a clockwise direction, and so on. The TLinks between the little bubbles are drawn in the figure as well. One example is in Figure 1 in Appendix A.2. 3.4. Results In the set of the 119 annotated news, 3068 events were identified, occurring both at the story and report levels. There is only one news article devoid of the report level, which is evidence of the prominence of this part in the overall structure of the type of news that composes the dataset. The news article in Appendix A.1 illustrates the pattern followed by the generality of the news that were annotated. In this particular example, as one can observe in the Bubble diagram of Appendix A.2.1, the news includes five blocks of report marked by five reporting verbs. Hence, it is expected to encounter a reasonably high number of reporting events that build the report level. We found, per news, an average of 3.50 reporting events in a total of 417 events. At the story level, the number of events is necessarily higher, with a total of 2651 events, an average of 22.27 events per news, because the purpose of this type of journalistic text is to inform the reader about a situation. So, returning to our running example in A.1, events such as feriu, fazer disparos, recusou, parar (’wounded, shooting, refused, stop’) are part of the story level. In this example, we can observe that not all the events that compose the story are introduced by a reporting event. For instance, the information given in the third paragraph about the event atingido (’hit’) is one of those cases. Table 1 systematises the analysis of the temporal relations across the two levels under scrutiny: the story and report levels. Overall, the analysis of the temporal relations indicates that, for most cases, the temporal relation Identity, picked whenever two events are the same, is the most frequent, which is expected because, as explained in Section 2, prototypically, the journalist presents the main event in the news’ lead and then other events related to the main one are presented in more detail throughout the news. In our running example (A.1), this is true for the event fazer disparos (�shooting�), which is introduced in the first report block and then resumed in the second report block (cf. Appendix A.2.1). The recurrence of the TLink Identity is also related to some annotation’s rules, such as the one about light verb constructions like fizeram disparos, which stipulates that the noun (disparos) and the light verb (fizeram), must be linked by an Identity TLink. This is recommended in the ISO standard to capture the fact that, despite being two distinct words that have the potential to denote two different events, they represent a single 83 Table 1 Temporal relations Between events... Relation (A)....of the story (B)....embedded in the reports (C)....across the reports Identity 587 255 16 After 546 173 15 Before 420 151 7 Simultaneous 413 176 3 Includes 278 125 6 Is_included 232 109 5 During 39 23 0 Begun_by 4 3 0 Begins 3 1 0 Ends 2 0 0 Ended_by 2 1 0 event in the constructions in question. Regarding the events that form the story, the second most common temporal relation is After, as shown in column A of Table 1. In this case, the events are presented chronologically, that is, they are described in the order they happened. This applies to most of the events depicted in paragraph six of our running example and represented in the MSC A.2.2. Our findings align with the evidence presented by [15]’s study and described in Section 2. The same observation about the frequency of Tlinks is true concerning the story events described in different report blocks (cf. column C of Table 1). Although the differences towards other temporal relations are not so notable, the first event of report B is typically linked to an event of report A by TLink After, which means that there is temporal sucessivity. The relation between the event desconhecer (’didn’t know’) and intimidação (’intimidating’) illustrates this feature (cf. A.2.1). The TLink Before often ensues in our dataset, which is in accordance with the general structure of news, being the third most recurrent TLink within the events that compose the story and the events across reports. In fact, news usually, after making known the main event, give an account of earlier events. In our running example in Appendix A.1, the events disparos (�firing’) and cessar (�stop�) represented in the second bubble of the picture in Appendix A.2.1 showcase this temporal relation. The analysis of the temporal relations between the events described by each report block (cf. column B Table 1) reveals different results. Even though the chronological order is also frequently adopted, as one can observe in the Bubble representation in A.2.1, it comes after the simultaneity relation. The simultaneity and inclusion relations are also common (cf. Table 1), which are compatible with the fact that, again and again, the journalist provides more detail about the main event, describing either secondary events that happen at the same time or subevents that are part of an event. Once again, this is true for instances of our running example, such as the inclusion relation between shooting and refusing to stop, two secondary events reported in the first report block (cf. first Bubble in Appendix A.2.1). Overall, the results disclose a wide and rich variety of temporal relations between all the story’s events. The events embedded in the reports follow the general trend of being linked by 84 Identity, but they often describe situations that happen simultaneously more frequently than when considering only the story’s events. Table 2 Event Type and TLinks Type of Events After Before Simultaneous Includes Is_Included Count 422 284 132 20 44 ’Transition’, ’Transition’ Percentage 61.26% 50.27% 36.16% 6.51% 16.3% Count 104 79 39 9 88 ’Transition’, ’State’ Percentage 15.09% 13.98% 10.68% 2.93% 32.59% Count 59 29 21 3 30 ’Transition’, ’Process’ Percentage 8.56% 5.13% 5.75% 0.98% 11.11% Count 44 66 73 180 16 ’State’, ’Transition’ Percentage 6.39% 11.77% 20% 58.55% 5.93% Count 43 40 27 21 8 ’Process’, ’Transition’ Percentage 6.24% 7.08% 7.40% 6.84% 2.96% Analysing the association between some of the events’ grammatical features, particularly Class, Type, Tense, PoS, and temporal relations, leads to relevant conclusions. The first con- clusion drawn from the results is that PoS and Tense are not pertinent factors when inferring temporal relations. In terms of the association of Class to TLinks, since one can group the different labels into eventive and stative situations, and because the tag Type is more specific concerning aspectual differences, we deemed it best to analyse the latter. As a matter of fact, regarding the label Type, there seems to be a correlation between the values attributed to events and the temporal relation. This correlation concerns the aspectual properties of telicity and durativity. Telicity is the property of situations that cannot last indefinitely, as they have an in- trinsic terminal point in their internal temporal structure. This aspectual property distinguishes Transitions, i.e., dynamic events, corresponding to [39]’s Accomplishments (e.g.’read a book’, ’paint a house’) and Achievements (e.g. ’win a race’, ’reach the summit’), from Processes (also, dynamic events, but atelic, corresponding to Vendler’s Activities, like ’sleep’, ’swim’) and States (non-dynamic events, like ’be tall’, ’like’, ’live’). In most cases, temporal successive readings (which subsumes After and Before TLinks) involve two situations (cf. Table 2): Transitions, or at least one of them is a Transition. For example, regarding TLink_Before, there were 284 cases (50,27%) involving only Transitions, while with TLink_After there were 422 cases (61,26%). On the contrary, TLink_Before connects Transitions and States in only 25,75% of the cases and Transitions and Processes in 12,21% of the cases, whereas TLink_After connects Transitions and States in 21,48% of the cases and Transitions and Processes in 14,8% of the cases. The number of cases in which both situations are non-telic, i.e., Processes or States is con- siderably smaller (cf. Table 2). For instance, the cases involving only States were 22 (3,89%) (TLink_before) and 15 (2,18%) (TLink_After), while those involving only Processes were 10 (1,77%) (TLink_Before) and 8 (1,16%) (TLink_After). In summary, although it is not a categorical distinction, there is a clear correlation between telicity and temporal successivity. Simultaneity readings seem to be correlated with another aspectual property – duration. 85 States and Processes are durative; Transitions may or may not be durative (as this tag subsumes two aspectual types: Accomplishments, which are durative, and Achievements, which are non-durative (cf. [39])). Simultaneity readings are related to three TLinks: TLink_Is_Included, TLink_Includes and TLink_Simultaneous. The first two links show a clear preference for in- cluding Transitions in the time interval of States, as presented in Table 2 (58.55% with the TLink_Includes and 32.59% with the TLink_is_Included). On the contrary, the numbers in- volving only Transitions (i.e., where a Transition is located in the time interval of another Transition) are considerably lower (cf. Table 2: 6.51% with the TLink_Includes and 16.3% with the TLink_Is_Included). However, TLink_Simultaneous presents different results from the previous ones, as the most significant number of cases involves two Transitions (36,16%) (cf. Table 2). The examples in which Transitions are simultaneous with States correspond to 30.68% cases, while those in which Transitions are simultaneous with Processes correspond to only 13.15% (cf. Table 2). The relatively high number of cases of two Transitions related by TLink_Simultaneous is likely related to the fact that, in the news, different situations are mentioned that are concomitant with the central situation of the news, which is described in the lead. To sum up, although there are no absolute restrictions, it appears that simultaneity readings are related to durative situations, particularly when the TLinks Is_included and Includes are involved. 4 4. Conclusions The purpose of the study presented in this paper was to investigate the temporal structure of news articles at two levels - the story and report levels. Three research questions were formulated. With respect to the first research question - How are the events temporally organised at the story level? -, our analysis revealed that, in the majority of cases, the journalist repeats the same event introduced in the lead throughout the news, resulting in the most frequent TLink Identity. However, the additional information provided by the journalist about the main event tends to follow a chronological order, which is why TLink After is also frequently observed. Regarding RQ2, which aims to investigate the role of the report level in the temporal organisation of news stories, we can conclude that reporting events hold significant importance in this type of news articles. Though not all events in the story are introduced by reporting events, when they are, the most frequent relation within each block of the reporting event is simultaneity. Therefore, there is a somewhat distinct pattern of temporal organisation depending on whether we consider each reporting event or the entire news article. In relation to our third research question (RQ3), which is concerned with the relationship between the temporal relations and the grammatical characteristics of events, our study indicates that Type is the most significant attribute. Additionally, we have observed a correlation between telic situations and temporal successiveness, and durative situations and temporal simultaneity. In summary, our linguistic analysis provided valuable insights into how news articles convey 4 This same exact correlation between temporal readings and aspectual properties, in particular the fact that telicity triggers temporal succession, whereas durativity triggers simultaneity, was also pointed out for independent reasons in other works (cf. [40] for the analysis of adverbial perfect participle clauses in Portuguese and English) 86 events. By studying linguistic features, especially those that determine temporal relations, we can use that information to improve models to retrieve or predict temporal information. The visualisations generated by the pipeline we developed were crucial in automatically generating temporal relations, verifying manual annotations, and analysing temporal structures. Our study has provided a comprehensive understanding of the narrative temporal structure of news articles, contributing to the field of narrative analysis. Acknowledgments National Funds finance this work through the FCT - Fundação para a Ciência e a Tecnologia, I.P. (Portuguese Foundation for Science and Technology) within the project StorySense, with reference 2022.09312.PTDC. It was also funded by national funds through FCT – Fundação para a Ciência e a Tecnologia, I.P., within the project UIDB/00022/2020. References [1] A. Leeuwenberg, M.-F. Moens, Temporal information extraction by predicting relative time-lines, in: Proceedings of the EMNLP’18, ACL, Brussels, Belgium, 2018, pp. 1237–1246. [2] B. Santana, R. Campos, E. Amorim, A. Jorge, P. Silvano, S. Nunes, A survey on narrative extraction from textual data, Artif Intell Rev (2023). [3] I. Zahid, H. Zhang, F. Boons, R. Batista-Navarro, Towards the automatic analysis of the structure of news stories, CEUR Workshop Proceedings 2342 (2019) 71–79. 2nd International Workshop on Narrative Extraction From Texts, Text2Story 2019 ; Conference date: 14-04-2019. [4] B. Allan, The language of news media, Blackwell, 1991. [5] J. Chovanec, Pragmatics of Tense and Time in News, Pragmatics & Beyond New Series, John Benjamins Publishing Company, Amsterdam, 2014. [6] P. Silvano, E. Amorim, A. Leal, I. Cantante, F. Silva, A. Jorge, R. Campos, S. Nunes, Annota- tion and Visualisation of Reporting Events in Textual Narratives, in: R. Campos, A. Jorge, A. Jatowt, S. Bhatia, M. Litvak (Eds.), CEUR Workshop Proceedings, volume 3370 of CEUR Workshop Proceedings, 2023, pp. 47–62. [7] J. C. Harry, Journalistic quotation: Reported speech in newspapers from a semiotic- linguistic perspective, Journalism 15 (2014) 1041–1058. [8] S. Pareti, Towards a discourse resource for Italian: developing an annotation schema for attribution, Ph.D. thesis, Master’s thesis, Faculty of Letters and Philosophy, University of Pavia, Italy, 2009. [9] G. Palshikar, S. Pawar, S. Patil, S. Hingmire, N. Ramrakhiyani, H. Bedi, P. Bhattacharyya, V. Varma, Extraction of Message Sequence Charts from Narrative History Text, in: Proceedings of the First Workshop on Narrative Understanding, 2019, pp. 28–36. [10] S. Hingmire, N. Ramrakhiyani, A. K. Singh, S. Patil, G. Palshikar, P. Bhattacharyya, V. Varma, Extracting Message Sequence Charts from Hindi Narrative Text, in: Proceedings of the First Joint Workshop on Narrative Understanding, Storylines, and Events, 2020, pp. 87–96. 87 [11] S. Liu, Y. Wu, E. Wei, M. Liu, Y. Liu, Storyflow: Tracking the evolution of stories, IEEE Transactions on Visualization and Computer Graphics 19 (2013) 2436–2445. [12] Q. Chen, S. Cao, J. Wang, N. Cao, How does automation shape the process of narrative visualization: A survey of tools, IEEE Transactions on Visualization and Computer Graphics (2023). [13] ISO-24617-1, Language resource management - Semantic annotation framework (SemAF) - Part 1: Time and events (SemAF-Time, ISO-TimeML), Standard, Geneva, CH, 2012. [14] A. Bell, News time, Time & Society 4 (1995) 305–328. doi:10.1177/ 0961463X95004003003. [15] C. Schokkenbroek, News stories: Structure, time and evaluation, Time & Society 8 (1999) 59–98. doi:10.1177/0961463X99008001004. [16] J. Sanders, K. van Krieken, Traveling through narrative time: How tense and temporal deixis guide the representation of time and viewpoint in news narratives, Cognitive Linguistics 30 (2019) 281–304. doi:10.1515/cog-2018-0041. [17] I. Mani, B. Schiffman, Temporally anchoring and ordering events in news, 2004. [18] A. Setzer, Temporal Information in Newswire Articles: an Annotation Scheme and Corpus Study, Ph.D. thesis, University of Sheffield, 2001. [19] E. Filatova, E. Hovy, Assigning time-stamps to event-clauses, in: Proceedings of the Work- shop on Temporal and Spatial Information Processing - Volume 13, TASIP ’01, Association for Computational Linguistics, USA, 2001, pp. 1–8. doi:10.3115/1118238.1118250. [20] J. Pustejovsky, R. Ingria, R. Saurí, J. M. Castaño, J. Littman, R. J. Gaizauskas, A. Setzer, G. Katz, I. Mani, The specification language timeML, in: I. Mani, J. Pustejovsky, R. J. Gaizauskas (Eds.), The Language of Time - A Reader, Oxford University Press, 2005, pp. 545–558. [21] J. Pustejovsky, K. Lee, H. Bunt, L. Romary, ISO-TimeML: An international standard for semantic annotation, in: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), European Language Resources Association (ELRA), Valletta, Malta, 2010. URL: http://www.lrec-conf.org/proceedings/lrec2010/pdf/55_Paper. pdf. [22] F. Costa, Processing Temporal Information in Unstructured Documents, Ph.D. thesis, Faculdade de Ciências da Universidade de Lisboa, 2012. [23] R. Campos, Disambiguating Implicit Temporal Queries for Temporal Information Retrieval Applications, Ph.D. thesis, Faculdade de Ciências da Universidade do Porto, 2013. [24] R. Campos, G. Dias, A. M. Jorge, C. Nunes, Identifying top relevant dates for implicit time sensitive queries, Information Retrieval Journal 20 (2017) 363 – 398. URL: https: //api.semanticscholar.org/CorpusID:254583715. [25] Q. Ning, H. Wu, D. Roth, A multi-axis annotation scheme for event temporal rela- tions, in: Proceedings of the 56th Annual Meeting of the Association for Compu- tational Linguistics (Volume 1: Long Papers), Association for Computational Linguis- tics, Melbourne, Australia, 2018, pp. 1318–1328. URL: https://aclanthology.org/P18-1122. doi:10.18653/v1/P18-1122. [26] H. Sousa, R. Campos, A. Jorge, Tei2go: A multilingual approach for fast temporal expression identification, in: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM ’23, Association for Computing Machinery, New York, 88 NY, USA, 2023, p. 5401–5406. doi:10.1145/3583780.3615130. [27] H. Sousa, R. Campos, A. M. Jorge, Tieval: An evaluation framework for temporal in- formation extraction systems, in: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’23, Associa- tion for Computing Machinery, New York, NY, USA, 2023, p. 2871–2879. doi:10.1145/ 3539618.3591892. [28] R. Campos, G. Dias, A. M. Jorge, A. Jatowt, Survey of temporal information retrieval and related applications, ACM Comput. Surv. 47 (2014). doi:10.1145/2619088. [29] A. Rogers, G. Smelkov, A. Rumshisky, Narrative time: Dense high-speed temporal annotation on a timeline, CoRR abs/1908.11443 (2019). URL: arxiv.org/abs/1908.11443. arXiv:1908.11443. [30] L. Derczynski, Automatically Ordering Events and Times in Text, 2017. [31] ISO-24617-4, Language resource management- Semantic annotation framework (SemAF) - Part 4: Semantic roles (SemAF-SR), Standard, Geneva, CH, 2014. [32] ISO-24617-7, Language resource management-Semantic annotation framework (SemAF) - Part 7: Spatial information, Standard, Geneva, CH, 2020. [33] ISO-24617-9, Language resource management- Semantic annotation framework (SemAF) - - Part 9: Reference annotation framework (RAF), Standard, Geneva, CH, 2019. [34] P. Silvano, A. Leal, F. Silva, I. Cantante, F. Oliveira, A. Mario Jorge, Developing a multilayer semantic annotation scheme based on ISO standards for the visualization of a newswire corpus, in: Proceedings of the 17th Joint ACL - ISO Workshop on Interoperable Semantic Annotation, Association for Computational Linguistics, Groningen, The Netherlands (online), 2021, pp. 1–13. URL: https://aclanthology.org/2021.isa-1.1. [35] A. Leal, P. Silvano, E. Amorim, I. Cantante, F. Silva, A. Mario Jorge, R. Campos, The place of ISO-space in Text2Story multilayer annotation scheme, in: Proceedings of the 18th Joint ACL - ISO Workshop on Interoperable Semantic Annotation within LREC2022, European Language Resources Association, Marseille, France, 2022, pp. 61–70. URL: https: //aclanthology.org/2022.isa-1.8. [36] L. Deleger, Q. Li, T. Lingren, M. Kaiser, K. Molnar, L. Stoutenborough, M. Kouril, K. Marsolo, I. Solti, Building gold standard corpora for medical natural language processing tasks, AMIA ... Annual Symposium proceedings. AMIA Symposium 2012 (2012) 144—153. URL: https://europepmc.org/articles/PMC3540456. [37] A. Brandsen, S. Verberne, M. Wansleeben, K. Lambers, Creating a dataset for named entity recognition in the archaeology domain, in: N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis (Eds.), Proceedings of the Twelfth Language Resources and Evaluation Conference, European Language Resources Association, Marseille, France, 2020, pp. 4573–4577. URL: https://aclanthology.org/2020.lrec-1.562. [38] L. J. V. Miranda, Developing a named entity recognition dataset for tagalog, 2023. arXiv:2311.07161. [39] Z. Vendler, Verbs and times, The philosophical review 66 (1957) 143–160. [40] P. Silvano, A. Leal, J. Cordeiro, On adverbial perfect participial clauses in Portuguese varieties and British english, Romance Languages and Linguistic Theory 2018: Selected papers from’Going Romance’32, Utrecht 357 (2021) 263–286. 89 A. An Example from our Dataset A.1. The news article A GNR feriu ligeiramente com “bagos de borracha” (balas de borracha) um homem em Cabeceiras de Basto que estava a fazer disparos para o ar com munições reais e que se recusou a parar, disse hoje fonte policial. O caso deu-se na freguesia de Refojos de Basto, concelho de Cabeceiras de Basto, distrito de Braga, um pouco antes das 00:00. O homem, de 52 anos, foi atingido pelas balas de borracha nas pernas. Fonte do Comando Territorial da GNR em Braga disse à agência Lusa que os operacionais só atingiram o homem depois de este desobedecer à ordem de cessar os disparos. “Não parou sequer quando os militares fizeram disparos de intimidação para o ar”, acrescen- tou a fonte, que afirmou desconhecer as causas do comportamento do detido. Depois de tratado a ferimentos ligeiros em unidade hospitalar, foi detido no posto da GNR em Cabeceiras e será hoje presente a juiz de instrução para fixação das medidas de coação tidas por convenientes. Como o caso envolveu armas de fogo, a Polícia Judiciária vai ser informada de detalhes, indicou a fonte. The GNR slightly wounded a man in Cabeceiras de Basto with "rubber bullets" who was shooting into the air with live ammunition and refused to stop, a police source said today. The incident took place in the parish of Refojos de Basto, in the municipality of Cabeceiras de Basto, Braga district, just before 00:00. The 52-year-old man was hit by rubber bullets in the legs. A source from the GNR’s Territorial Command in Braga told Lusa news agency that the officers only hit the man after he disobeyed the order to stop firing. "He didn’t even stop when the soldiers fired intimidating shots into the air," added the source, who said he didn’t know the cause of the detainee’s behaviour. After being treated for minor injuries at a hospital, he was detained at the GNR post in Cabeceiras and will be brought before an investigating judge today for the coercive measures deemed appropriate. As the case involved firearms, the Judicial Police will be informed of the details, the source said. 90 A.2. The temporal structure representations A.2.1. Bubble representation Figure 1: The bubbles follow a chronological order. The first big bubble, representing a reporting event, is positioned at noon, and the subsequent big bubbles, following the hourly pattern, represent reporting events that occur later. This allows us to discern the sequence of reporting events in the text based on the order of the big bubbles. Each reporting event also contains events within it. These events are the ones that have been declared or reported by someone and are also arranged chronologically, similar to the big bubbles. Finally, the temporal relationships between these events are depicted through arrows connecting the bubbles or rectangles. In this example, the first reporting event is disse (’said’) and the number “4.0” indicates that this event is in the fourth sentence of the document. Also, attached to this event, there are the following events: feriu (’wounded’), fazer (’did’), disparos (’shots’), recusou (’refused’), parar (’stop’). The dashed arrows connect events inside the same Big Bubble and the filled arrows connect events of different Big Bubbles. 91 A.2.2. MSC representation Figure 2: This Message Sequence Chart (MSC) illustrates the sequence of events within the story layer. Each colour corresponds to a specific class of events: orange for the Occurrence class, green for the I_Action class, light purple for the State class, and blue for reporting events. The events are arranged from left to right to reflect their sequential order in the narrative. Additionally, the diagram depicts the temporal relations between these events. 92