Event Extraction Alone Is Not Enough Junbo Huang1,∗ , Longquan Jiang1 , Cedric Möller1 and Ricardo Usbeck2,∗ 1 University of Hamburg, Department of Computer Science, Vogt-Kölln-Straße 30, 22527 Hamburg, Germany 2 Leuphana University Lüneburg, Universitätsallee 1, 21335 Lüneburg, Germany Abstract With the growing amount of online data, distinguishing between similar events and news about them poses a significant challenge for both companies and crisis reaction units. To discriminate event instances, we present Eventist, a silver-standard event instance dataset from news in English, containing 23,304 news headlines from 90 countries covering in total 113 storm-related events between 1𝑠𝑡 January 2021 and 1𝑠𝑡 September 2023. Sampled data is validated by two human raters. Additionally, we propose to adopt a sentence-level event representation for modeling media narrative discourse. Finally, we provide two pairwise comparison benchmarks on event deduplication and event temporal ordering, enabling the practicality of event extraction. Keywords Event Deduplication, Event Temporal Ordering, Media Narrative Discourse 1. Introduction With climate change, a noticeable uptick in the frequency and severity of tropical storms and their associated impacts is observed [1, 2]. Notably, storm surges, including extreme floods and tsunamis, often result in more significant damage than the storms themselves. To automatically identify storm events holds the potential for fast crisis responses. Consequently, event extraction from media texts has emerged as a pivotal task in Natural Language Processing (NLP) [3, 4]. We view media texts as narratives describing specific events with their often implicit causal or temporal relations. We address news headlines as compressed narratives containing partial event information. An event, in this context, denotes the real-world occurrence of a particular incident, attributed to its constituent elements such as participants, temporal attributes, and geographical location, often resulting in a change of state of a set of geopolitical entities (GPEs) [5]. In this study, we represent events with English-language news headlines, with a focus on storm-related narratives. Specifically, storm-related narratives are texts with at least one storm-related event mention. In NLP, events are referred to as word-based event mentions, typically represented as verbs or nouns [3, 4, 6]. For instance, in the headline “Latest from the Tropics: Tropical Storm Bret In: R. Campos, A. Jorge, A. Jatowt, S. Bhatia, M. Litvak (eds.): Proceedings of the Text2Story’24 Workshop, Glasgow (Scotland), 24-March-2024 ∗ Corresponding author. Envelope-Open junbo.huang@uni-hamburg.de (J. Huang); longquan.jiang@uni-hamburg.de (L. Jiang); cedric.möller@uni-hamburg.de (C. Möller); ricardo.usbeck@leuphana.de (R. Usbeck) Orcid 0000-0002-3192-5896 (J. Huang); 0000-0002-7333-2589 (L. Jiang); 0000-0001-6700-3482 (C. Möller); 0000-0002-0191-7211 (R. Usbeck) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 105 (a) Event distribution over temporally overlapping (b) Sentence-level event representation on 2023 Cy- events. Note that multiple events can have a clone Gabrielle in different stages of event de- compound effect on a set of geopolitical entities. velopment. This representation captures event Such compound effect can not be captured by attributes, the effect of events and the telling of extracting only word-based event mentions. the events, known as narrative discourse. Figure 1: Distribution of Events and Illustration on Different Media Narratives Overtime weakens while Cindy strengthens”, the extracted word-based event mentions are “Tropical Storm Bret” (formed on June 19, 2023) and “Cindy” (formed on June 27, 2023). We argue for the significance of recognizing narrative discourses involves nuances in news storytelling. In this example, the narrative’s temporal focus shifts from the decline of tropical storm Bret to the emergence of tropical storm Cindy. We consider the self-referential change of state, of one being weakened and another being strengthened, as the effect of an event, represented by a single news headline containing two word-based event mentions. A significant yet often overlooked challenge lies in determining whether different news articles pertain to the same event. Discrepancies between word-based event mentions and events present a formidable obstacle known as event instance discrimination [7, 8, 9]. Despite its critical importance, there exists no standardized benchmark or task formulation to address this issue. Prior attempts have involved entity matching based on factors such as location and time [7, 8, 10], or linking event mentions to entries in a knowledge graph (KG) like Wikidata [9]. However, heuristic-driven methods often suffer from poor generalization, while KG-driven approaches are limited by the inability to represent events absent from the KG. More importantly, extracting only word-based event mentions fails to represent the effect of events. Furthermore, annotating mentions at the word level is a costly process, often resulting in low agreement among annotators [11, 12, 13]. Evidence suggests that word-based event mentions are not crucial for effective event detection [14, 15]. In our approach, we utilize sentence-level event representation, as illustrated in Figure 1b. Sentence-level event representation can better capture (1) event attributes, (2) the effect of events, and (3) the telling of the events, also known as narrative discourse [16]. Narrative discourse varies before, during, and after the event, which is practically important in many disciplines [3, 4, 17, 18]. To achieve this goal of distinguishing event instances in news streams (Figure 1a), drawing inspiration from the success of deep distance metric learning [19, 20], we suggest a baseline 106 system for two pairwise comparison tasks. Our contributions include: • Introducing a large-scale silver-standard event instance dataset named Eventist. The dataset comprises 23,304 English news headlines, covering a total of 113 storm-related real-world events from 90 countries between January 1, 2021, and September 1, 2023. • Providing two benchmarks on event deduplication and event temporal ordering. We have made the dataset, baseline models, and code openly accessible on https://github.com/ semantic-systems/paper-event-instance-discrimination. 2. Related Work Dataset To our knowledge, the most closely related dataset concerning crisis-related news articles linked to KGs is Crisisfacts [4], initially designed for extracting atomic facts for temporal summarization. Events are linked to Wikidata, and publication dates are available as meta- information. It is theoretically possible to re-formulate the temporal summarization task into event deduplication and event temporal ordering. However, as depicted in Table 1, Crisisfacts encompasses only 8 events spread across 31 unique dates (all confined to the United States), limiting its applicability. In contrast, Eventist contains news headlines describing 113 different events, covering 688 unique dates, and a significantly longer narrative duration. Table 1 Dataset Descriptives. #Events #Dates #News Mean Duration Max Duration Min Duration Crisisfacts 8 31 41,147 3.88 7 2 Eventist 113 688 23,274 10.69 71 2 Methods for Event Deduplication Heuristic-based event deduplication methods first extract entities from texts, which are used as features for event deduplication. [10] deployed a graph-based event merging strategy. [8] considered multiple metrics measuring temporal similarity, string similarity, and entity similarity. [7] utilized external entities information in a KG. KG-based approaches aim to directly link word-based event mentions to Wikipedia [9]. While heuristic-based approaches can only identify texts of high similarity, KG-based approaches fail to capture events not included in the KG. Representation of Event Temporal Relations Freksa’s cognitive perspective on time and temporal reasoning, as elucidated in his work [21], offers a simplified version of Allen’s interval-based temporal relations, comprising 13 distinct types [22]. Unlike Allen’s approach, which presents a compositionally complete framework for temporal reasoning, Freksa’s semi- interval-based representation acknowledges the uncertainty inherent in event boundaries and measures temporal relations based on the occurrence of events. While Allen’s model may be theoretically robust, it may not always align with the intricacies of narrative studies. In this work, the temporality of events refers to the narrated time of events rather than their actual occurrence. Viewing narratives as a point in time offers a coherent perspective, which recognizes that narratives encapsulate a specific temporal snapshot rather than a comprehensive depiction of events as they unfold in the real world. 107 3. Dataset Construction Figure 2 shows the dataset construction pipeline. This includes news acquisition and denoising, headline pairs sampling and validation. Figure 2: Dataset Construction Pipeline. News Acquisition News articles are initially collected from the Global Database of Events, Language, and Tone (GDELT) [3], utilizing storm-related keywords such as storm, flood, hur- ricane, typhoon, and tornado. The search spans from January 1, 2021, to September 1, 2023, with English language restrictions, yielding 1,978,483 raw articles from 149 countries. After identifying and removing 50.28% of duplicate entries, 983,775 unique news articles remain. Our preliminary studies showed that the regular expressions-based GDELT search includes pages where at least one keyword appears in the title, body, image caption, or advertisements. To refine the selection to storm-related news, additional steps are taken, including cosine similarity-based clustering, event type annotation, entity recognition, and temporal clustering. 3.1. Denoising Pipeline The steps in denoising data are designed to construct a dataset with high precision. This entails ensuring that each cluster uniquely refers to a specific event, and that all headlines within any cluster reference the same event, despite potential narrative variations. Clustering on headlines To generate representations for event mentions, we employ a pre-trained sentence transformer (all-MiniLM-L6-v21 ). Subsequently, we calculate a distance matrix 𝑑 ∈ ℝ𝑛×𝑛 based on cosine similarity for 𝑛 sentence pairs. Clusters are established with two criteria: 1) each cluster must comprise a minimum of 50 instances, and 2) cosine similarities between every pair of instances must exceed 0.7. This process results in the retrieval of 786,944 news instances, organized into 264 clusters. Annotation on Event Type and GPE We used the Text REtrieval Conference Incident Stream (TREC-IS) dataset to train an event type classifier for soft labeling clusters. TREC-IS is a collection of microblog posts about pre-disaster, in-disaster, and post-disaster discussions [3]. It contains gold-standard annotation of 118 events. Clusters are chosen if the majority of headlines are predicted as tropical storms, hurricanes, floods, typhoons, or tornadoes. Additionally, spaCy’s NER (en_core_web_md2 ) is used to annotate GPEs. Given the anticipated variations in 1 https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 2 https://spacy.io/models/en#en_core_web_md 108 narrative discourse within each cluster, differences in GPEs among headlines belonging to the same cluster are expected. The annotated GPEs play a crucial role in the subsequent merging of clusters. Temporal Clustering For each cluster, a one-dimensional temporal clustering algorithm (DBSCAN) is employed to eliminate temporal outliers, with temporal information represented by the news publish date. We adopted a temporal granularity of one day, setting min_samples = 3 and 𝜖 = 1, resulting in the removal of 752,429 headlines. Merging Clusters The remaining 34,515 continuous mentions form 278 clusters representing storm-related disaster instances. These clusters encompass various narratives, including pre- disaster, in-disaster and post-disaster information. The maximum overlapping ratio, denoted as 𝑟, between the GPEs in any pair of clusters (𝐶𝑖 and 𝐶𝑗 ) is computed using Equation 1. |𝐶𝑖 ∩ 𝐶𝑗 | 𝑟𝑖𝑗 = (1) max(|𝐶𝑖 |, |𝐶𝑗 |) To capture long events such as 2023 Cyclone Freddy,3 we allow a temporal gap of 10 days between clusters. Two clusters are merged if (1) GPEs overlap ratio 𝑟𝑖𝑗 ≥ 0.5, and (2) minimum between-cluster temporal distance 𝑑𝑖𝑗 = min𝑐∈𝐶𝑖 ,𝑐 ′ ∈𝐶𝑗 𝑑(𝑐, 𝑐 ′ ) ≤ 10. As a result, 263 unique clusters are returned. To further examine the clustering quality, one domain expert manually linked each cluster to a Wikidata entity of type “occurrence”4 and checked for geospatial and temporal consistency between mentions and the actual event occurrences, concluding 113 unique clusters. 3.2. Dataset Validation We randomly selected 5 headlines from each cluster, resulting in a total of 565 sampled headlines for assessing cluster coherence and uniqueness. All headlines are shown at once. Cluster coherence was evaluated using a 5-point Likert scale, with two human raters indicating the extent to which media narratives described the same event, ranging from ”Strongly Disagree” (1) to ”Strongly Agree” (5). The mean Likert response score for coherence was 𝑠𝑚𝑒𝑎𝑛 = 4.03, Krippendorff’s alpha 𝛼 = 0.736. The uniqueness of events within each cluster was assessed by determining whether each event was distinct within the set. Cohen’s kappa statistic for uniqueness was 𝜅 = 1. 4. Benchmarks Event deduplication is a binary classification task5 and event temporal ordering is a multiclass classification task6 . We use a bi-encoder (siamese network structure) comparing DistilRoBERTa- base, DistilBERT-base-cased, RoBERTa-base and BERT-base-cased. Consistent with [19], a vector concatenation considering element-wise difference between both embedded vectors, 3 The longest recorded tropical cyclone 4 https://www.wikidata.org/wiki/Q1190554 5 Two labels for event deduplication: same_event_instance, different_event_instance 6 Three labels for temporal ordering: before, equal and after 109 𝑢 ⊕ 𝑣 ⊕ |𝑢 − 𝑣|, is later fed into a softmax classification layer. We used Adam optimizer with learning rate 2e−5, and a linear learning rate warm-up over 10% of the training data. Number of training epoch is 5. The dataset is partitioned to include clusters of varying sizes in each split (Table 2), consisting of 9 large clusters (𝑁𝑐 ≥ 500), 48 medium-sized clusters (500 > 𝑁𝑐 ≥ 200), and 58 small clusters (𝑁𝑐 < 200). Table 2 Number of instances of different splits in the final dataset. Parentheses in #Events indicate the number of events from a large/medium/small cluster. #Sampled headline pairs #Sampled headline pairs Split #Events #Headlines (Event Deduplication) (Event Temporal Ordering) Train 74(7/33/34) 17,449 610,714 872,448 Valid 17(1/7/9) 2,455 73,650 73,650 Test 24(1/8/15) 3,570 107,100 107,100 Within each split, stratified sampling is applied to sample headline pairs, balancing date distribution, label distribution, and event instance distribution in headline pairs. Each model is run with three seeds, and all experiments are run with an NVIDIA RTX A6000 48 GB graphics card. Table 3 shows the benchmark result over 4 seeds (0, 1, 2, 3). Table 3 Benchmark Results. Event Deduplication Event Temporal Ordering DistilRoBERTa-base 81.90±3.03 55.07±0.24 DistilBERT-base-cased 89.02±4.98 56.59±0.47 RoBERTa-base 77.33±2.37 56.76±0.67 BERT-base-cased 86.14±6.46 55.55±0.29 The baseline result under the pairwise comparison task formulation reveals two significant implications. Firstly, it demonstrates the feasibility of discriminating event instances using only news headlines, as evidenced by the result in event deduplication. This suggests that the denoising pipeline and event type detection mechanisms are effective in distinguishing how different storm-related events are mentioned in the headlines. Secondly, the result highlights the challenge of identifying the temporal ordering of headlines mentioning storm-related events. This task proves to be much more complex compared to discriminating between individual event instances. The nuanced temporal relationships and contextual dependencies within narratives pose a greater difficulty in determining the chronological sequence of events discussed in news articles. 5. Limitations and Discussions 5.1. Generalizability of the Proposed Denoising Pipeline One noticeable ad-hoc component in the denoising pipeline that hinders generalizability is the domain-specific event type detector, trained in a supervised learning fashion. We used 110 an event type detector on a gold-standard human-annotated dataset (TREC-IS) to annotate event types to select news headlines classified as being storm-related. While this approach prioritizes the accuracy of event type annotations, it limits the generalizability to different event types. We highlight the importance of showcasing the practical implications of the relatively under-explored task of event instance discrimination, exemplified by storm-related events. However, advancements in zero-shot learning techniques, such as prompt-based approaches leveraging Large Language Models (LLMs), offer promising avenues to replace the ad-hoc event type detector, potentially enhancing the pipeline’s generalizability across diverse event types. It is crucial to conduct a thorough evaluation of zero-shot event type detectors to ensure the quality of annotations. 5.2. Event Temporal Ordering Beyond News Publication Date We used news publication dates as features to identify the temporal order of any pair of news headlines. This representation ignores the temporal lag between news publication and the actual occurrence of events. We acknowledge the limitations of this approach, particularly in extracting precise absolute temporal attributes of events. A more accurate method would involve linking event mentions to well-structured data sources such as Knowledge Graphs (KGs) to extract strictly precise temporal attributes. However, the task of event linking remains relatively unexplored, with existing works primarily linking event mentions to community-driven sources like Wikipedia, which may contain inaccurate, or false information [9]. Despite the inherent limitations, leveraging news publication dates for temporal ordering provides an approximate timeline of event developments within media narratives. This ap- proach allows for the construction of a detailed narrative timeline, facilitating the identification of in-disaster news reports and further extraction, normalization, and analysis of temporal expressions. 6. Conclusion In conclusion, we introduced the Eventist dataset, a comprehensive silver-standard dataset that serves as a benchmark for two essential pairwise comparison tasks: event deduplication and event temporal ordering. While we validated the dataset with human raters, it’s important to note that it may contain noise introduced by biases inherent in the denoising pipeline. However, our focus on event instance discrimination underscores the dataset’s significance and opens avenues for further research in refining and enhancing these critical NLP tasks. Acknowledgments The authors acknowledge the financial support by the Federal Ministry for Economic Affairs and Energy of Germany in the project CoyPu (project number 01MK21007G). 111 References [1] R. Méndez-Tejeda, J. J. Hernández-Ayala, Links between climate change and hurricanes in the North Atlantic, PLOS Climate 2 (2023) e0000186. URL: https://doi.org/10.1371/journal. pclm.0000186. doi:10.1371/journal.pclm.0000186. [2] D. Xi, N. Lin, A. Gori, Increasing sequential tropical cyclone hazards along the US East and Gulf coasts, Nature Climate Change 13 (2023) 258–265. URL: https://doi.org/10.1038/ s41558-023-01595-7. doi:10.1038/s41558-023-01595-7. [3] R. McCreadie, C. Buntain, I. Soboroff, TREC incident streams: Finding actionable in- formation on social media, in: Z. Franco, J. J. González, J. H. Canós (Eds.), Proceed- ings of the 16th International Conference on Information Systems for Crisis Response and Management, València, Spain, May 19-22, 2019, ISCRAM Association, 2019. URL: http://idl.iscram.org/files/richardmccreadie/2019/1867_RichardMcCreadie_etal2019.pdf. [4] R. McCreadie, C. Buntain, Crisisfacts: Buidling and evaluating crisis timelines, in: 20th International Conference on Information Systems for Crisis Response and Management (IS- CRAM 2023), Omaha, NE, USA, 2023, pp. 320–339. URL: https://doi.org/10.59297/JVQZ9405. doi:10.59297/JVQZ9405. [5] G. R. Doddington, A. Mitchell, M. A. Przybocki, L. A. Ramshaw, S. M. Strassel, R. M. Weischedel, The automatic content extraction (ACE) program - Tasks, Data, and Evaluation, in: Proceedings of the Fourth International Conference on Language Resources and Evaluation, LREC 2004, May 26-28, 2004, Lisbon, Portugal, European Language Resources Association, 2004. URL: http://www.lrec-conf.org/proceedings/lrec2004/summaries/5.htm. [6] X. Wang, Z. Wang, X. Han, W. Jiang, R. Han, Z. Liu, J. Li, P. Li, Y. Lin, J. Zhou, MAVEN: A massive general domain event detection dataset, in: B. Webber, T. Cohn, Y. He, Y. Liu (Eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020, Association for Computational Linguistics, 2020, pp. 1652–1671. URL: https://doi.org/10.18653/v1/2020.emnlp-main.129. doi:10.18653/V1/2020.EMNLP-MAIN.129. [7] F. Rollo, L. Po, Crime event localization and deduplication, in: J. Z. Pan, V. A. M. Tamma, C. d’Amato, K. Janowicz, B. Fu, A. Polleres, O. Seneviratne, L. Kagal (Eds.), The Seman- tic Web - ISWC 2020 - 19th International Semantic Web Conference, Athens, Greece, November 2-6, 2020, Proceedings, Part II, volume 12507 of Lecture Notes in Computer Science, Springer, 2020, pp. 361–377. URL: https://doi.org/10.1007/978-3-030-62466-8_23. doi:10.1007/978-3-030-62466-8\_23. [8] V. Zavarella, J. Piskorski, C. Ignat, H. Tanev, M. Atkinson, Mastering the media hype: Methods for deduplication of conflict events from news reports, in: A. M. Jorge, R. Cam- pos, A. Jatowt, A. Aizawa (Eds.), Proceedings of AI4Narratives - Workshop on Artificial Intelligence for Narratives in conjunction with the 29th International Joint Conference on Artificial Intelligence and the 17th Pacific Rim International Conference on Artificial Intelligence (IJCAI 2020), Yokohama, Japan, January 7th and 8th, 2021 (online event due to Covid-19 outbreak), volume 2794 of CEUR Workshop Proceedings, CEUR-WS.org, 2020, pp. 29–34. URL: https://ceur-ws.org/Vol-2794/paper6.pdf. [9] X. Yu, W. Yin, N. Gupta, D. Roth, Event linking: Grounding event mentions to wikipedia, in: A. Vlachos, I. Augenstein (Eds.), Proceedings of the 17th Conference of the European 112 Chapter of the Association for Computational Linguistics, EACL 2023, Dubrovnik, Croatia, May 2-6, 2023, Association for Computational Linguistics, 2023, pp. 2671–2680. URL: https://doi.org/10.18653/v1/2023.eacl-main.196. doi:10.18653/V1/2023.EACL-MAIN.196. [10] W. Ai, J. Xu, H. Shao, Z. Wang, T. Meng, An entity event deduplication method based on connected subgraph, in: J. Yang, K. Li, W. Tu, Z. Xiao, L. Wang (Eds.), 7th International Conference on Systems and Informatics, ICSAI 2021, Chongqing, China, November 13-15, 2021, IEEE, 2021, pp. 1–6. URL: https://doi.org/10.1109/ICSAI53574.2021.9664040. doi:10. 1109/ICSAI53574.2021.9664040. [11] O. Inel, L. Aroyo, Validation methodology for expert-annotated datasets: Event annotation case study, in: M. Eskevich, G. de Melo, C. Fäth, J. P. McCrae, P. Buitelaar, C. Chiarcos, B. Klimek, M. Dojchinovski (Eds.), 2nd Conference on Language, Data and Knowledge, LDK 2019, May 20-23, 2019, Leipzig, Germany, volume 70 of OASIcs, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2019, pp. 12:1–12:15. URL: https://doi.org/10.4230/OASIcs. LDK.2019.12. doi:10.4230/OASICS.LDK.2019.12. [12] Z. Song, A. Bies, S. M. Strassel, J. Ellis, T. Mitamura, H. T. Dang, Y. Yamakawa, S. Holm, Event nugget and event coreference annotation, in: M. Palmer, E. H. Hovy, T. Mitamura, T. O’Gorman (Eds.), Proceedings of the Fourth Workshop on Events, EVENTS@HLT- NAACL 2016, San Diego, California, USA, June 17, 2016, Association for Computational Linguistics, 2016, pp. 37–45. URL: https://doi.org/10.18653/v1/W16-1005. doi:10.18653/ V1/W16-1005. [13] C. Colruyt, O. D. Clercq, T. Desot, V. Hoste, EventDNA: A dataset for dutch news event extraction as a basis for news diversification, Lang. Resour. Evaluation 57 (2023) 189–221. URL: https://doi.org/10.1007/s10579-022-09623-2. doi:10.1007/S10579-022-09623-2. [14] S. Liu, Y. Li, F. Zhang, T. Yang, X. Zhou, Event detection without triggers, in: J. Burstein, C. Doran, T. Solorio (Eds.), Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Volume 1 (Long and Short Papers), Association for Computational Linguistics, 2019, pp. 735–744. URL: https://doi. org/10.18653/v1/n19-1080. doi:10.18653/V1/N19-1080. [15] T. Ling, L. Chen, H. Sheng, Z. Cai, H. Liu, Sentence-level event detection without trig- gers via prompt learning and machine reading comprehension, CoRR abs/2306.14176 (2023). URL: https://doi.org/10.48550/arXiv.2306.14176. doi:10.48550/ARXIV.2306.14176. arXiv:2306.14176. [16] A. Piper, R. J. So, D. Bamman, Narrative theory for computational narrative under- standing, in: M. Moens, X. Huang, L. Specia, S. W. Yih (Eds.), Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021, Virtual Event / Punta Cana, Dominican Republic, 7-11 November, 2021, Association for Computa- tional Linguistics, 2021, pp. 298–311. URL: https://doi.org/10.18653/v1/2021.emnlp-main.26. doi:10.18653/V1/2021.EMNLP-MAIN.26. [17] S. Diaf, J. Döpke, U. Fritsche, I. Rockenbach, Sharks and minnows in a shoal of words: Measuring latent ideological positions based on text mining techniques, European Journal of Political Economy 75 (2022) 102179. URL: https://doi.org/10.1016/j.ejpoleco.2022.102179. doi:10.1016/j.ejpoleco.2022.102179. [18] T. Zhang, A. M. Schoene, S. Ji, S. Ananiadou, Natural language processing applied to 113 mental illness detection: A narrative review, npj Digit. Medicine 5 (2022). URL: https: //doi.org/10.1038/s41746-022-00589-7. doi:10.1038/S41746-022-00589-7. [19] N. Reimers, I. Gurevych, Sentence-BERT: Sentence embeddings using siamese BERT- networks, in: K. Inui, J. Jiang, V. Ng, X. Wan (Eds.), Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, November 3-7, 2019, Association for Computational Linguistics, 2019, pp. 3980–3990. URL: https://doi.org/10.18653/v1/D19-1410. doi:10.18653/V1/D19-1410. [20] S. Nishikawa, R. Ri, I. Yamada, Y. Tsuruoka, I. Echizen, EASE: Entity-aware contrastive learning of sentence embedding, in: M. Carpuat, M. de Marneffe, I. V. M. Ruíz (Eds.), Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2022, Seattle, WA, United States, July 10-15, 2022, Association for Computational Linguistics, 2022, pp. 3870–3885. URL: https://doi.org/10.18653/v1/2022.naacl-main.284. doi:10.18653/V1/2022. NAACL-MAIN.284. [21] C. Freksa, Temporal reasoning based on semi-intervals, Artif. Intell. 54 (1992) 199–227. URL: https://doi.org/10.1016/0004-3702(92)90090-K. doi:10.1016/0004-3702(92)90090-K. [22] J. F. Allen, Maintaining knowledge about temporal intervals, Commun. ACM 26 (1983) 832–843. URL: https://doi.org/10.1145/182.358434. doi:10.1145/182.358434. 114