Event Extraction and Discourse Monitoring Theresa Krumbiegel1 , Albert Pritzkau1 and Hans-Christian Schmitz1 1 Fraunhofer Institute for Communication, Information Processing and Ergonomics (FKIE), Fraunhoferstr. 20, 53343 Wachtberg, Germany Abstract Event extraction demands the detection of event descriptions in texts and the transformation of such descriptions into a standardised, structured format. It is framed as a natural language understanding task. In addition, media event extraction aims at structuring the media space and giving an overview on the topics being discussed and the dynamics of discourse. We will describe approaches to event extraction and media event extraction and discuss how these approached can be linked in order to support the analysis of diverging world views in media discourse.1 Keywords distant reading, event extraction, media events, media space 1. Introduction Our aim is to explore the media space, in particular social media in order to investigate the creation of conflicting world views. By “media space”, we refer to the very large, fast and continuously growing multilingual collection of texts, images, video and audio data that are distributed via traditional media as well as social media. Social media include YouTube, Twitter, Facebook, and other platforms [3]. A large part of the media space is accessible via the Internet. It contains huge amounts of “cultural data”, relevant for cultural analysis [4]. The media space provides information on the physical world: what happened? Which events are currently ongoing? What is planned or predicted to happen in the future? [5] Besides being a sensor for the physical world, the media space is a forum for ideologies, opinions and values. It is a space for the negotiation of what a society considers to be permissible, prescribed or forbidden, and for acting out sentiment and bias. As such, the media space is a research object for ideology analysis. If the entire media space is considered a research object, then exhaustive close reading is plainly impossible. Therefore, media space analysis demands the development of suitable distant reading methods. A distant reader focuses on specific features of texts instead of reading the 1 This paper is based on our papers on “Distant and Reading and Event Extraction” [1] and “Conflict Monitoring” [2]. The former can be considered an extended abstract of the present paper. The latter examines the value of extracting and connecting details of real-world and media events from the media space, with the goal to use this information to enhance situational awareness in crisis management. For the present paper, we re-use text modules from both [1] and [2]. Humanities-Centred AI (CHAI), Workshop at the 44th German Conference on Artificial Intelligence, September 28, 2021, Berlin, Germany $ theresa.krumbiegel@fkie.fraunhofer.de (T. Krumbiegel); albert.pritzkau@fkie.fraunhofer.de (A. Pritzkau); hans-christian.schmitz@fkie.fraunhofer.de (H. Schmitz) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) texts completely. By reducing the reading effort in this way, she is enabled to receive large text collections. Focusing on specific features can also support her in finding the most relevant texts or text passages that are to be read closely [6, 7]. Distant reading requires automatic information reduction. Automatic information reduction includes the extraction and, possibly, transformation of relevant features from a corpus. Features can be defined as properties or attributes of the underlying data set. There is no definite path to identify the most relevant ones and only crude guidelines exist. It is very much guided by available data and the properties of the task to be solved [8]. With the objective to extract common and useful patterns from data, feature engineering must be regarded as a very important skill of the researcher [9]. The difficulty that comes with feature engineering and the effort involved are the main reasons for the emergence of algorithms that automatically engineer these feature representations. Feature learning algorithms and tools that find common patterns already exist – based on autoencoders and embeddings – and they are changing every aspect of how relevant features are identified, encoded, and distinguished from irrelevant ones [10]. In the present paper, we will discuss two approaches to extracting and presenting information from large text collections and thereby supporting distant reading. The focus of the paper is on the first of these approaches, that is, event extraction. Event extraction involves the recognition of event descriptions and the transformation of these descriptions into a standardized, structured format. The aim is to collect and display the events that the texts are about. In Section 2, we will propose a processing pipeline for event extraction. Texts cannot only be about events, but they can also constitute events themselves. We call such events “media events”. Tentatively, we define a media event as a the coherent reporting, commenting or discussion of a certain topic in the media. Media event extraction aims at structuring the media space and giving an overview on the topics being discussed and the dynamics of discourse. To this end, word-distribution based topic modellings and clustering can be applied. We briefly describe a media event extraction approach in Section 3. Finally, in Section 4, we will give an outlook on the linking of event and media event extraction and their application for ideology analysis. 2. Event Extraction An important function of texts is to represent states of affairs and courses of actions. A text can be fictional or just false. Still, it describes a world, even if this world does not agree with what we consider to be real. In order to further analyse the signified, our aim is to extract event descriptions from text and transform these descriptions into standardized, structured representations. Structured event representations can serve as input for visualization and further investigation. The most elementary representation of an event includes the event type, time and location. They can be inserted into a template that defines the structure of the event representation. Types are pre-defined. Times and locations can be defined with varying accuracy – e.g., locations can be specified as cities, by addresses, or by coordinates. Besides the basic information, further information on the actor, the affected, the reporter, etc. can be provided, if available. Event extraction has been studied for numerous years. Due to the continuous development in the field of natural language processing, approaches to event extraction are diverse [11]. Our processing pipeline for automatic event recognition, extraction and display consists of the following steps: first, wrappers collect input data. A wrapper can perform key word guided information retrieval and, therefore, function as a first filter. The usage of a keyword narrows down the subject area of the corpus. The collection of data can also happen non-specifically without a keyword. An example of a situation in which the collection is not keyword-triggered is when all recent articles from a specific feed (e.g., of the last two hours) are considered relevant and are included in the corpus accordingly. Which data is used as input data can be adapted depending on the use case. Examples are, among others, social media messages (e.g., obtained from Twitter) as well as texts extracted from online newspapers. We restrict ourselves to text analysis and leave images, videos and audio data out of consideration. Second, using trained binary classification models, it is decided whether a retrieved document contains information on an event and is, thus, to be further analyzed. This task can be solved on the document-level or on the sentence-level and can therefore be seen as a two-step process. In the first step, larger entries within the corpus, i.e., entire documents are analysed and marked as including an event description or not. In the second step, specific sentences of the documents that include event descriptions are extracted. The extracted sentences are the ones holding the actual event information. The information space is inherently multilingual. Therefore, classification models should make use of multilingual language models. In a first experiment to solve the task of binary classification, we used one hundred separately trained densely connected neural networks for the document classification task and a single network for the classification on the sentence level [12]. Both approaches made use of document embeddings that are generated by using the pre-trained multilingual cased BERT model [13]. Our models were tested on multilingual data sets including English, Spanish, Portuguese and Hindi text instances. For document classification, a macro F1 (F-measure) of 0.65 was reached, for sentence level classification a macro F1 of 0.70. We plan to improve these scores by further developing our models and conducting additional experiments. Third, document metadata are extracted and saved for further processing. Metadata of interest are, e.g., the author of the text as well as place and time of creation. The metadata obtained are used for data management. A comparison with existing entries in the data collection is performed. Entries that are already included are not incorporated again and are consequently excluded from further processing. This step is optional, as a further comparison of the text based on the extracted events is conducted in the following step. However, it can be useful to reduce the amount of data already at this stage to avoid the unnecessary processing of duplicates. In the fourth processing step, different mentions of the same events are determined. Resolving co-references to events can be handled as a clustering task. Further analyzing different mentions supports the detection of contradictions between various mentions of the same event. One approach to solving the clustering problem is to train and optimize a simple neural network to compare two documents that both include an event description. This means that the neural network basically acts as a comparison function. The results of this comparison can then be used to build a graph. The graph consists of vertices and edges. The documents/sentences are represented by the vertices. If the network predicts that two documents/sentences belong to the same cluster, an edge is added in the graph between the corresponding vertices, otherwise no edge is added. The resulting graph is analysed with regard to disjoint subgraphs. Each individual subgraph represents an event cluster. Such a graph is shown in Figure 1. Figure 1: Example of a possible graph In this simplified example graph, four documents including event descriptions are given. After processing these documents in the way that was described above, two event clusters are found. The first cluster includes documents 1, 2 and 4, the second cluster includes only document 3. We can determine that documents 1, 2 and 4 are about the same event and that document 3 is about another event. Fifth, a fine-grained event classifier is used to determine the type of the detected event. Additional information, in particular on time and location are extracted. The event type, can be determined by using a fine-grained classification model. One database that can be used as training data for such a classifier is the Armed Conflict Location and Event Data Project (ACLED) database [14]. The ACLED database contains six event types and 25 sub event types. Fine-grained classification of events aims at the detection of the 25 sub event types. In general, all event and sub event types describe either a violent event (e.g. “battles” and “explosions”), a demonstration (e.g. “protests” and “riots”) or non-violent actions (e.g. “strategic developments”). Our current approach uses a fine-tuned RoBERTa transformer model [15] with document embeddings and ACLED data as its basis [16]. The model was evaluated on non-ACLED data in order to assess its robustness and reached a weighted F1 of 0.830. The model scores high for sub event types that can be seen as (semantically) compact, e.g., “suicide bomb” (F1 0.976) and “remote explosion” (F1 0.957) and low for sub event types that are less compact, e.g., the generic sub event type “other” (F1 0.400). These results were to be expected. To calculate the topic compactness of each sub event type, we first embedded all examples. We then averaged the resulting vectors for each sub type. The average represents the topic centroidss. Then, the Euclidean distance of each text vector to the topic centroid is calculated. Figures 2 and 3 show the topic compactness of the event sub types “other” and “suicide bomb” respectively. In addition to the event types, i.e. the “What”, the “Where” (location) and “When” (time) of an event are crucial elements of an event representation. They need to be extracted. To this end, the text entries that remain after the previously described processing steps can be analysed with a Named Entity Recognition (NER) model. Named Entities are for example locations, organizations, persons and times/dates. A well-functioning NER model can provide relevant information about a given event almost entirely, including information about location and time and even involved actors. If one were to use a NER model for an entire document, problems would arise in terms of selecting the relevant information for a given event. Since in a previous processing step the exact event sentences were determined, meaning the sentences from a Figure 2: Topic Distance “Other” Figure 3: Topic Distance “Suicide Bomb” document that contain the event arguments, we assume that the extraction of the applicable location, time and involved persons with a NER model can work in our case. An example using the spaCy NER model [17] is described below. To depict the results, we use an entry for the sub event type “peaceful protest” from the ACLED database (Figure 4). Figure 4: Example of NER We can see that the NER model finds the time (DATE) as well as the location (GPE) of the event. Additionally, two persons are identified, namely “Daunte Wright” and “Adam Toledo” and a cardinal (“about 75”) is given. While information about the location and date of the event seem to be straightforward, the entities marked as person and cardinal need to be set into the correct context in order to understand their role. For assigning meaning to the identified entities, Semantic Role Labeling (SRL) can be applied. SRL is “the computational identification and labeling of arguments in text.” It aims at determining “who” did “what” to “whom”, “where”, “when”, and “how” [18]. Thus, the arguments of a text that can be found with the help of SRL are in line with the arguments that we need to define a meaningful representation of an event. A number of annotation sets for SRL exist. We draw on the ones proposed for the Proposition Bank corpus [19]. SRL can help finding the correct context for previously identified entities. These relations then support the correct interpretation of already extracted event arguments. For the example given above, the SRL model by AllenNLP [20] provides the annotations displayed in Figure 5. We see that the entities for date and location are marked as a temporal and a location modifier, respectively. This means that with regard to their function in the text, no new information is gained by using SRL. This was expected. However, the entities “about 75”, “Daunte Wright” Figure 5: Example of SRL and “Adam Toledo” are now set into the correct contexts. “About 75 people” which is not been detected as a person entity, is now marked as the agent of the event sentence. “Daunte Wright” and “Adam Toledo” are components of a purpose modifier, showing that they are not agents or patients of the event, but part of the reason the event is happening. These additional information contribute to a better understanding of the named entities and the event in general. Sixth, all information is inserted into a predefined template, so that a standardized event representation is created. This template includes all of the event arguments mentioned above as well as a unique identifier. On the basis of the completed template, a further comparison of the events contained and identified in the database can be performed. This is necessary because it can be assumed that a specific event is not reported only once within the media space. In order to create a coherent situational picture, reports on identical events, insofar as they do not contain useful new information, do not all have to be processed further. In contrast to the identity management step mentioned earlier, the goal here is to find duplicate events that still may have come from different sources and not to find duplicated sources/reports/articles. Co-reference resolution already addresses the distinction or identification of event representations, however, it can be assumed that a specific event cluster that is found using the described approach for co-reference resolution, may include instances of the same main event but different sub events. Seventh, this representation is transferred into a symbol from a given library. A viewer service draws the chosen symbol on a map. By extracting events from a text collection and drawing respective symbols on a map or creating time-lines of events, respectively, we represent situations and courses of action as they are described in the text collection. 3. Media Event Extraction A media event is the coherent reporting, commenting or discussion of a certain topic in the media. Media events can be very short, e.g., when a specific incident is reported only once and then “forgotten”. However, media events can also be much longer, e.g., when information is progressively updated, commented and discussed. In that case, different reporters can contribute to the same media event. Topics of media event can be entities like persons, institutions etc., or other events, among them both physical world events – events as described in the previous Section, including events that never took place – and other media events. Media event extraction aims at structuring the media space and giving an overview on the topics being discussed and the dynamics of the discussions. In essence, it can be implemented as a clustering of the media space and the distillation and description of the clusters’ features that are deemed relevant. It enables browsing through topics and the distant reading of communication threats. A paradigmatic use case is the investigation of social media discourse. Topics are modelled as distributions over content words derived from documents. To this end, we apply Latent Dirichlet Allocation (LDA, [21]): based on the vocabulary of a document, topics can be assigned to it with a certain probability. Therefore, topics give rise to a soft, i.e., probabilistic clustering. Documents can also be clustered directly without creating topic models first. Clustering [22] is applied to document representations capturing also contextual information, by making use of automatic feature engineering such as autoencoders [10]. Resulting clusters can be described by their characteristic keywords in a following step. Figure 6: Topic visualisation on news articles Figure 6 shows a map of topics for a text collection (left) and and a key word distribution for a selected topic (right). Topic models and text clusters give the distant reader an overview on the topics under discussion and enable her to identify relevant clusters that are to be investigated further. Topics that are significantly prevalent in an underlying data set can be considered as correlated with media events. The temporal distribution of the volume flow of a topic can be understood as communication behaviour, that is, the internal dynamics of a media event. 4. Conclusion and Outlook We have described two different techniques for distilling information from large text collections, namely event extraction and media event extraction. By event extraction, information on events is identified in texts and transformed into standardized, structured representations. These representations give an overview on the events that are referred to within the texts. They display an essential aspect of the world view conveyed by a text collection. We can compare different text collections regarding the events they describe and, thus, extract the divergences of their world views. By media event extraction, we identify and describe the topics being under discussion in the media space. If we take the time dimension into consideration, we can describe the dynamics of discourse and topic changes. It remains an interesting challenge to link event representations and topic/media event representations in order to answer questions on the discursive context of events and on the roles events play in creating world views through texts. An obvious link between the two kinds of representations is that (physical world) events can be topics themselves or occur in topics. Topic representations can then help to describe the narratives around these events. A further way to link events and topics is via social networks. Persons can be involved in topics, either actively or as being affected. Persons are related to each other by occurring within the same topics, possibly in different roles. Moreover, they act as authors, recipients and/or intermediaries in the exchange and distribution of information. Just like persons appear in media events, they also appear in physical world events, again taking various roles. They are connected both via the events themselves and via their reporting. Thus, networks of persons (social networks) can be a connected to both events and media events and thereby be a glue for linking them. Social network analysis enables us to attribute event descriptions and media events to social groups. We assume that event descriptions and topics are a plausible basis for describing the world views of these groups and analyse potential conflicts between them. Thus, we claim the hypothesis that event extraction, media event extraction and social network analysis can be promising tools for ideology analysis. “Further research is needed.” References [1] T. Krumbiegel, A. Pritzkau, H.-C. Schmitz, Distant reading and event extraction (2021). URL: https://www.fdr.uni-hamburg.de/record/9672/files/KI2021_CHAI_submission3.zip? download=1. [2] S. Kent, T. Krumbiegel, A. Pritzkau, H.-C. Schmitz, Conflict monitoring, in: Artificial Intelligence, Machine Learning and Big Data for Hybrid Military Operations (AI4HMO), NATO, 2021. [3] M. Andree, T. Thomsen, Atlas der Digitalen Welt, Campus, 2020. [4] L. Manovich, Cultural Analytics, MIT Press, Cambridge/Mass, 2020. [5] P. W. Singer, E. T. Brooking, Like War. The Weaponization of Social Media, First Mariner Books, 2019. [6] F. Moretti, Distant Reading, Verso, London, 2013. [7] S. Jänicke, G. Franzini, M. F. Cheema, G. Scheuermann, On Close and Distant Reading in Digital Humanities: A Survey and Future Challenges, in: R. Borgo, F. Ganovelli, I. Viola (Eds.), Eurographics Conference on Visualization (EuroVis) - STARs, The Eurographics Association, 2015. doi:10.2312/eurovisstar.20151113. [8] P. Domingos, A few useful things to know about machine learning, Commun. ACM 55 (2012) 78–87. URL: https://doi.org/10.1145/2347736.2347755. doi:10.1145/2347736. 2347755. [9] K. Krippendorff, Content analysis: An introduction to its methodology, 4th ed., Sage, Los Angeles, 2018. [10] I. Goodfellow, Y. Bengio, C. A., Deep Learning, MIT Press, Cambridge/Mass, 2016. [11] W. Xiang, B. Wang, A survey of event extraction from text, IEEE Access 7 (2019) 173111– 173137. doi:10.1109/ACCESS.2019.2956831. [12] N. Becker, T. Krumbiegel, FKIE_itf_2021 at CASE 2021 task 1: Using small densely fully connected neural nets for event detection and clustering, in: Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021), Association for Computational Linguistics, Online, 2021, pp. 113–119. URL: https://aclanthology.org/2021.case-1.15. doi:10.18653/v1/2021.case-1. 15. [13] J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional transformers for language understanding, CoRR abs/1810.04805 (2018). URL: http://arxiv. org/abs/1810.04805. arXiv:1810.04805. [14] C. Raleigh, A. Linke, H. Hegre, J. Karlsen, Introducing acled: An armed conflict loca- tion and event dataset: Special data feature, Journal of Peace Research 47 (2010) 651– 660. URL: https://doi.org/10.1177/0022343310378914. doi:10.1177/0022343310378914. arXiv:https://doi.org/10.1177/0022343310378914. [15] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoy- anov, Roberta: A robustly optimized bert pretraining approach, 2019. arXiv:1907.11692. [16] S. Kent, T. Krumbiegel, Case 2021 task 2 socio-political fine-grained event classification using fine-tuned roberta document embeddings, 2021, pp. 208–217. doi:10.18653/v1/ 2021.case-1.26. [17] M. Honnibal, I. Montani, spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing, 2017. To appear. [18] L. Màrquez, X. Carreras, K. C. Litkowski, S. Stevenson, Semantic Role Labeling: An Introduction to the Special Issue, Computational Linguistics 34 (2008) 145–159. URL: https://doi.org/10.1162/coli.2008.34.2.145. doi:10.1162/coli.2008.34.2.145. [19] M. Palmer, D. Gildea, P. Kingsbury, The proposition bank: A corpus annotated with semantic roles, Computational Linguistics Journal (2005). [20] P. Shi, J. Lin, Simple bert models for relation extraction and semantic role labeling, ArXiv abs/1904.05255 (2019). [21] D. M. Blei, A. Y. Ng, M. I. Jordan, Latent dirichlet allocation, J. Mach. Learn. Res. 3 (2003) 993–1022. [22] D. Anastasiu, A. Tagarelli, G. Karypis, Data Clustering. Algorithms and Applications, 2nd. ed., Chapman and Hall/CRC, Boca Raton/Fl, 2018, pp. 305–338.