Towards Discourse Parsing-inspired Semantic Storytelling Georg Rehm1 , Karolina Zaczynska1 , Julián Moreno-Schneider1 , Malte Ostendorff1 , Peter Bourgonje1 , Maria Berger1 , Jens Rauenbusch2 , André Schmidt2 , and Mikka Wild2 1 DFKI GmbH, Alt-Moabit 91c, 10559 Berlin, Germany 2 3pc GmbH Neue Kommunikation, Prinzessinnenstraße 1, 10969 Berlin, Germany Corresponding Author: Georg Rehm – georg.rehm@dfki.de Abstract. Previous work of ours on Semantic Storytelling uses text an- alytics procedures including Named Entity Recognition and Event De- tection. In this paper, we outline our longer-term vision on Semantic Storytelling and describe the current conceptual and technical approach. In the project that drives our research we develop AI-based technologies that are verified by partners from industry. One long-term goal is the development of an approach for Semantic Storytelling that has broad coverage and that is, furthermore, robust. We provide first results on ex- periments that involve discourse parsing, applied to a concrete use case, “Explore the Neighbourhood!”, which is based on a semi-automatically collected data set with documents about noteworthy people in one of Berlin’s districts. Though automatically obtaining annotations for coher- ence relations from plain text is a non-trivial challenge, our preliminary results are promising. We envision our approach to be combined with additional features (NER, coreference resolution, knowledge graphs). Keywords: Semantic Storytelling · Natural Language Processing · Dis- course Parsing · Rhetorical Structure Theory · Penn Discourse TreeBank 1 Introduction Cultural institutions such as museums, archives or libraries often rely on public funding and therefore need to communicate their value to the public constantly. One successful way to achieve this goal is to employ storytelling, which can be defined as creating emotional, interactive narratives in a digital format. Story- telling enables cultural institutions to make use of their digitized collections, demonstrating their relevance and reaching out to new audiences. Due to the extremely large amounts of available digital content, the curation of stories is typically performed by human knowledge workers. This calls for automated pro- cedures. Such procedures should 1) label the content for several types of meta- data semi-automatically, allowing for relevant categorisation. And 2) process the individual content pieces to present the information contained in them to a knowledge worker in an intuitive way. Since cultural organisations are often Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 Georg Rehm et al. unlikely to be able to face this challenge on their own, we develop a platform supporting this use case in the the technology transfer project QURATOR. Our goal are semi-automatic technologies that keep the human in the loop and allow for fast, efficient and intuitive exploration of large and highly domain-specific data sets. Relating events into a schematic structure, i. e., storytelling, and or- dering them, e. g., in terms of topic, locality or causal or temporal relationships, aid humans in finding meaningful patterns in data [3]. In earlier work, we described approaches to Semantic Storytelling making use of Named Entity Recognition (NER) and Event Detection [16, 24, 23]. In this article, we explore ways to present a knowledge worker the semantic struc- ture between text segments in an incoming text collection, making it possible to find interesting and surprising connections and information inside texts re- garding a predefined topic. We focus on means of relating text segments to each other by borrowing from frameworks for the processing of coherence relations. From Rhetorical Structure Theory (RST) [12] we borrow the idea that larger sequences of texts (i. e., non-elementary discourse units) are related, moving beyond the shallow parsing of individual coherence relations. From the Penn Discourse TreeBank (PDTB) [20] we use the sense inventory and perform a se- ries of experiments, relating text segments according to the four top-level classes of the PDTB sense hierarchy. The experiments are centered around the use case “Explore the Neighbourhood!”. This tool, currently in development, is an urban exploration app that makes uses of documents on the Berlin district of Moabit. It allows users to follow stories, created by an editor semi-automatically, while exploring the district both physically and digitally. The remainder of this paper is structured as follows. Section 2 reviews rele- vant work, in particular, approaches using discourse relations in text. Section 3 explains the use case in more detail. Section 4 provides a technical definition, while Section 5 outlines the experiments on the data set we created. Finally, Section 6 provides a summary and suggests directions for future work. 2 Related Work The act of storytelling and the resulting stories, can be seen as a strategy to uncover meaningful patterns in the world around us [3]. At the core of research on classical narratology, essential to storytelling, is the uncovering of the rules that underlie this strategy, or at least the ways to best achieve the goal. Early work on narratology is described in [1], defining a narrative as a discourse fol- lowing a plot structure that has a chronological and logical event order. More recently, [5] applied this definition of plot structure to (chrono)logically ordered events. Another line of work on narratology is represented by the work of [21], who analyzes the basic, irreducible, structural elements of Russian folk tales. More recently, Propp’s work was used by [33] for their story detection and gen- eration systems. The same authors, in [32], make use of another field of research related to text coherence, namely that of the processing of coherence relations. They apply the work of [29] on hierarchical discourse relations to work out how Towards Discourse Parsing-inspired Semantic Storytelling 3 paragraphs behave when being used as discourse-structural units in news ar- ticles, with the ultimate goal of understanding the importance and temporal order of story items. Our work follows a similar approach, but uses PDTB sense hierarchy labels. The PDTB [20] is an (English) corpus of Wall Street Journal articles (a subsection of the Penn TreeBank [13]) annotated for individual dis- course relations. We adopt the PDTB sense hierarchy, because it is the single largest corpus annotated for coherence relations and therefore the corpus best facilitating machine-learning based approaches. Due to the shallow nature of the PDTB framework (it only annotates individual relations, without making commitment to larger text structure, or mutual importance or relevance), we additionally source from RST [12], particularly the notion of nuclearity. In RST, a text is divided into Elementary Discourse Units, which are joined together, forming either a mono-nuclear relation (with one unit being the more promi- nent, important or relevant nucleus and the other, less prominent unit being the satellite) or a multi-nuclear relation. It is this notion of prominence, or relative importance to the storyline at hand, that we adopt from RST. With regard to application-driven approaches, much work has been done on the final, surface realisation aspect of text generation [7, 8]. An approach resembling more closely ours is described by [17], who use dependency parsing in combination with discourse relations to determine sentence relations. In our approach, however, in addition to finding relevant articles for the user, we want to classify the type of relation the articles in question have to each other. In our own previous work we described tools supporting the processing and generation of digital content with a strong industry focus, as is equally the case in the current context of the QURATOR project. The functionality of the curation technology platform is explained in [23]. [24] presents an example of this platform applied to the use case of a personal communication archive, i. e., a collection of approx. 2,800 letters exchanged between the German architect Erich Mendelsohn and his wife Luise between 1910 and 1953. From this, we extracted, i. a., named entities, temporal expressions and events, combined these and used them to track and visualise the movement (across the globe) of Erich and Luise. Additional prototypes are presented in [15] and [22]. 3 Industry Needs and Applications: The “Explore the Neighbourhood!” Use Case “Explore the Neighbourhood!” is a concept for a mobile app, which engages urban explorers in semi-automatically created stories, making use of digitized cultural collections. Moabit is a district in Berlin and was chosen due to its rich history and lively present. Such an app could be made available by museums, cities or municipalities, tourist information offices or local marketing campaigns. End users might be tourists, pupils studying in or visiting the neighborhood, or residents. Value is created for all parties by entertaining and educating users whilst communicating the district’s or cultural institution’s relevance. The app offers both curated and generated stories. While in a final concept of “Explore 4 Georg Rehm et al. the Neighbourhood!” these differences might not be noticed by the end user, in the following we will present each approach separately to describe the concept more precisely. We plan to fully integrate the approach described in Section 4. 3.1 Curated Stories Upon launching the app a set of interactive stories is offered to the end user who can influence the story’s direction, depth, and pace. Nevertheless, it still contains significant plot elements curated by an editor. The curation process requires the editor to define several storylines in a customised tool, which contains search capabilities and a recommendation system (Figure 1), both of which help surface relevant content for each step along a story path. Such a tool is made possible due to rich metadata which allow queries such as “poems describing Berlin in a praising tone” (text classification and analysis detecting locations and sentiment) or “photos showing Kurt Tucholsky next to a church” (image classification and analysis detecting people and objects, in this case churches). Figure 1 shows the user interface of such a tool. Fig. 1: The smart authoring environment Curated stories can be published to the app (Figure 2a). Stories may contain geographical points of interest within Moabit which are connected through an overall story arch, such as a biography. The exemplary stories depicted in this article follow the biography of Kurt Tucholsky (Figure 2b), a German-Jewish journalist and writer born in Moabit in 1890. The stories contain locations, historic photos and maps, scanned original works and editorial content. Towards Discourse Parsing-inspired Semantic Storytelling 5 (a) Description of Kommune 1 (b) Kurt Tucholsky’s biography Fig. 2: The app allows the exploration of different aspects of Moabit The existence of several storylines within a story, as well as several stories in parallel, allows for connections to be forged. These connections can be based on common topics, locations, or other parameters that support a consistent and emotional narrative. Users can follow one path through a story, choose to dive deeper into certain aspects of it, e. g., Kurt Tucholsky (Figure 2b), change their perspective onto a topic by exploring alternative stories, or switch to a completely different, yet connected story. The consumable stories are linked in a network and limited only by the amount of pieces of information and the size of the network created by the editor, who can extend it continuously. 3.2 Generated Stories Unlike curated ones, generated stories are created entirely by a storytelling en- gine. This is made possible due to a set of well-chosen parameters which influence the automatic selection and connection of content. These parameters are defined by several factors: – a chosen topic (initiated through a keyword or phrase) – the type of story being told (such as biography or travel guide), – users’ preferences (such as available time, current sentiment, preferred mode of travel), – users’ behavior (such as current location, walking speed, orientation). 6 Georg Rehm et al. Based upon the factors listed above, “Explore the Neighbourhood!” automat- ically generates a story by selecting the right content based on its rich metadata. The end result, which is the story consumed by the users, may not look so differ- ent from editor-curated stories. Nevertheless, since generating a story happens in real-time, it constantly adapts to users’ choices, which creates a more personal and more interactive experience. 4 Semantic Storytelling: Technical Description One of the goals of our Semantic Storytelling system is to aid knowledge workers in selecting relevant pieces of content, e. g., the app editor who wants to curate stories for the app. Following the prototype of the “Explore the Neighbourhood!” app (Section 3), this section describes the technical details of the back-end. Let us assume the following situation. A user is visiting a city and wants information about a topic T regarding the neighbourhood. The goal of the cura- tion prototype is, then, to identify and to suggest new content for the app that can be included in the user’s tour. To do so, we first have to initialise the topic T , for example, as a sentence, keyword or named entity. Next up, the tool has to identify if, for example, a document in a collection or a web page is relevant for topic T , and, if so, if it is important for T . Finally, we identify the semantic re- lation between incoming texts and the provided topic T , which could be, among others, background, cause, contrast, example etc. In the following, we describe these steps in more detail (Figure 3). Self-contained Incoming Content Web content Wikipedia document collection 1 Determine the relevance of a segment for T Possible instantiations of T • Complete document A Sentence 1 a Document relevance • Summary Topic Ranked list of • Claim or fact B Sentence 5 text segments • T b Segment relevance Event C Sentence 4 • Named entity A isLessImportantThan 2 Determine importance B of a segment C isMoreImportantThanT isMoreImportantThan User Comparison generating 3 Discourse relation between Comparison T B segment and topic A Stories C Expansion “Explore The Neighbourhood!” GUI Fig. 3: Architecture of the Semantic Storytelling approach Towards Discourse Parsing-inspired Semantic Storytelling 7 Step 1: Determine the Relevance of a Segment for a Topic The approach starts with a topic T , instantiated through a text segment such as a complete document, a headline or a named entity. To identify content pieces relevant for T , we process incoming textual content, like a self-contained document collection, a systematically compiled corpus or a knowledge base. For each piece of content, we need to decide whether its topic is relevant for T , which can be computed in various ways. We can employ topic modeling (LDA, LSA) or, without explicitly modeling topics, we can also perform pair-wise com- parisons of document similarity. Document pairs with a high similarity score are assumed to cover the same topic, therefore, we start with the seed document ds of which we know that it represents T and measure its similarity to other candidates. To compute semantic similarity, documents are represented as nu- merical vectors. Classical methods like bag-of-words or tf-idf encode documents as sparse vectors [25], while neural methods (word2vec, sent2vec, doc2vec, see e. g., [14, 19, 27]) produce dense representations. In both cases, cosine similarity can be used to compute the similarity of the document vectors. Step 2: Determine the Importance of a Segment If we have determined all documents d which are related to T , we need to determine the importance of d (or its segments or sentences) with regard to T . There is no off-the-shelf approach to determine the importance of a segment with regard to a topic, but various cues and indicators can potentially be exploited. One way of doing this is to borrow from RST, especially the notion of nuclearity. Constructing an RST tree involves decisions with regard to the status of text segments including their discourse relation to other segments and also regarding their role as a nucleus (the important core part of a relation) or satellite (the contributing part of a relation) in the context of a specific discourse relation. Two segments are assigned either a satellite-nucleus (S-N), nucleus-satellite (N-S) or a nucleus-nucleus (N- N) structure. This sub-task can be done in isolation [9, 28], or in conjunction with the relation classification task [11]. When performed iteratively, this pair- wise classification can result in a set of most important segments regarding T . Another way of determining topical importance is to treat it as a segment- level question answering task. Given a document d consisting of text segments (t1 , t2 , . . . tn ), the aim is to find the segment ti that contains the answer to the input question (i. e., topic T ). Transformer language models have achieved state-of-the-art results for question answering [6], suggesting that those model architectures would be beneficial for storytelling. Step 3: Semantic or Discourse Relation between two Segments After having established the relevance and relative importance, we proceed with deter- mining the semantic or discourse relation that exists between the text segments and topic T . Our initial experiments are based on the PDTB due to its con- siderably larger size with more than 1.1 million tokens over the RST-Discourse TreeBank [4] with approx. 200k tokens. We adopt the PDTB’s sense hierarchy, which comprises four top-level classes, 16 types on the second level and 23 sub- 8 Georg Rehm et al. types on the third. For now, our experiments are based on the top-level senses, Temporal, Contingency, Comparison, Expansions, and an additional label, None. 5 Experiment for “Explore the Neighbourhood!” In this section, we describe our first experiments, which aim to explore the suitability of the approach and helps us gain an understanding of what we can achieve in the long run. We concentrate on step 3, therefore, we created a data set of crawled web documents about the Berlin district Moabit, and implemented initial experiments to classify discourse relations between text segments inside the data set. We would like to show a comparison with similar tools, but to the best of our knowledge, there are no similar tools that are extracting semantic relations through intra-document text segments (using PDTB). 5.1 Data Set The data set is composed of documents containing information and stories con- nected to the district of Moabit in Berlin. We are in the first stages of developing this data set. In the long term, the idea is to put together a much larger col- lection of documents focused on Moabit so that it can be used for the Semantic Storytelling prototype. We used the focused crawler Spidey3 , which returns a list of URLs from websites which are based on a set of predefined query terms. We manually defined 28 queries about interesting places, buildings, or persons connected to Moabit. Some of these terms are Moabit, Moabit gentrification, Kleiner Tiergarten, Kulturfabrik Moabit, Berlin Central Station and Kurt Tu- cholsky. After obtaining the website URLs, we crawl and boilerplate the content of the pages and their metadata4 . The resulting data set is composed of slightly more than 100 documents that have been filtered manually in a second step. 5.2 Classifiers for Discourse Relation between Text Segments Our aim is to extract discourse relations from texts and so, being able to extract relevant content from a text collection and, in the longer run, to find new sto- rylines composed of semantically related parts of different text segments taken from the collection. We train a relation sense classifier on PDTB2 [20] and apply it on two pieces of content. For training, we use the two arguments of a rela- tion, but at a later point we deploy it using individual sentences. We argue that the sentence-level is the most appropriate level to use as input for our classifier (as opposed to the shorter token or phrase level, or the longer paragraph level) and that the discrepancy between argument shapes and typical sentence lengths (itself very much dependent on the domain) is tolerable. 3 https://github.com/vikrambajaj22/Spidey-Focused-Web-Crawler 4 We use Newspaper3k, see https://github.com/codelucas/newspaper Towards Discourse Parsing-inspired Semantic Storytelling 9 Classifier Model Classifying the discourse relation between sentence pairs re- quires a semantic understanding of the sentences. We encode the text as deep contextual representations with a language model based on the Transformer ar- chitecture [30]. To be precise, the pre-trained language model from DistilBERT [26], a distilled version of Bidirectional Encoder Representations from Trans- formers [6] is used5 . BERT performs well for document classification tasks [18]. To classify the relation between two texts, we employ a Siamese architecture [2]. In contrast to a classical Siamese model, in which a binary classifier is em- ployed on the output of the two identical sub-networks, we feed the sub-network output into a multi-label classifier, as illustrated in Figure 4. d1 d2 Fig. 4: The architecture of the Siamese BERT BERT BERT model for the classification of dis- course relations between two text seg- ments d1 and d2 . The output of the Concatenation classification layer ŷ holds the predicted MLP semantic relation according to the top- level PDTB2 senses. Classification Layer ŷ = SemRel(𝑑* , 𝑑, ) Text snippets d1 and d2 are inputs to the classifier. BERT’s architecture consists of six hidden layers, each layer consists of 768 units (66M parameters; DistilBERT). BERT is used in a Siamese fashion such that hi = BERT(di ) is the encoded representation of text di where hi is the last hidden state of the last BERT layer. The final feature vector xf is a combined concatenation of the text representations: h1 + h2 xf = [h1 ; h2 ; |h1 − h2 |; h1 ∗ h2 ; ] (1) 2 On top of the concatenation, we implement a Multi-Layer Perceptron (MLP). The MLP consists of two fully-connected layers, Ff (·) and Fg (·), where each layer has 100 units and ReLU(·) is the activation function. The discourse relation ŷ is classified on the basis of the feature vector xf as follows: ŷ = σ(Ff (ReLU(Fd (xf )))) (2) The logistic softmax function σ(·) generates probabilistic multi-label classifica- tions. The dimension of ŷ corresponds to the number of classification labels, which are the four top-level PDTB2 senses (Temporal, Contingency, Compari- son, Expansions) and one additional dimension (None). 5 We use the PyTorch implementation by HuggingFace [31]. 10 Georg Rehm et al. Transfer Learning Our target corpus of texts for “Explore the Neighbour- hood!” does not include any kind of annotated training data. Thus, we cannot use the data set to train the classifier. Instead, we rely on the PDTB2 data set. Training is performed with batch size b = 16, dropout probability d = 0.1, learning rate η = 2−5 (Adam optimizer) and 5 training epochs. These hyperpa- rameters are the ones proposed by [6] for BERT fine-tuning. PDTB Relation Precision Recall F1-score Support Comparison 0.50 0.47 0.48 1598 Contingency 0.38 0.65 0.48 1582 Expansion 0.50 0.79 0.61 2993 Temporal 0.51 0.55 0.53 869 None 0.49 0.73 0.59 1078 Micro avg. 0.47 0.67 0.55 8120 Macro avg. 0.48 0.64 0.54 8120 Table 1: Results of training multi-class prediction based on PDTB2 data set in a 80-20 train-test-split. The results that are derived from a 80-20 train-test-split are shown in Table 1. For evaluation, we use the multi class metric F1-micro average, which calculates the metrics globally by counting the total true positives, false negatives and false positives to compute the average metric. In a multi-class classification setup, micro-average is preferable if you suspect there might be class imbalance. In the end, we achieve 0.55 micro average F1. Due to the fact that we have not implemented features relating to the connective, our classification performs lower than current state-of-the-art approaches. 5.3 First Experiment on Use Case Data Set and Discussion Given the PTDB2-based classifier, we continue to find discourse relations within the corpus containing documents for the “Explore the Neighbourhood!” use case. As a preprocessing step, we first exclude all non-English documents and group documents by topic based on the query terms for the focused crawler. Next, we find document pairs among the topic groups (only semantically similar document pairs are considered). More precisely, documents are represented as tf-idf vectors and the cosine similarity of a document pair da and db must be above a fixed threshold (cosine(da , db ) > 0.15). Our classifier is trained to detect sentence-level relations, thus, we also split the documents into sentences6 . After excluding all sentences with less than five words, we end up with 96,796 sentence pairs that are passed to the classifier. 6 We use pySBD, see https://github.com/nipunsadvilkar/pySBD Towards Discourse Parsing-inspired Semantic Storytelling 11 Documents Discourse relations Topic Segment A Segment B S. Co. Ct. E. T. N. 1 Farin In April 2012 they re- At the age of 16, Vet- .51 .01 .01 .04 .93 .1 Urlaub leased another album ter went on a school trip Moabit “auch” (“also”) to London, and returned home as a punk with dyed blonde hair. 2 Uschi In 1968 and 1969 Ober- She played maracas in .39 .0 .0 .96 .03 .02 Obermaier maier starred in Rudolf the band Amon Düül, aka Thome’s first two fea- Amon Düül I, on two al- ture films, “Detektive” bums: Collapsing (1970, and “Rote Sonne” (“Red released by Metronome) Sun”). and Disaster (1972, re- leased by BASF Records [de]). 3 AEG It is an influential and However, when it came .32 .62 .16 .15 .01 .06 turbine well-known example of in- to AEG’s public image factory dustrial architecture. and public perception, the focus remained on Pe- ter Behrens: the famous artist-cum-architect over- shadowed the engineer. 4 Kurt Tu- Admittedly, Tucholsky is He saw himself as a left- .25 .45 .28 .19 .01 .08 cholsky seldom recognized as a wing democrat and pacifist Moabit philosopher. and warned against anti- democratic tendencies – above all in politics, the military and justice – and the threat of National So- cialism. 5 Schult- ”Good people drink good Schultheiss is currently .16 .32 .19 .42 .01 .06 heiss beer,” Hunter S Thompson brewing far less beer Brewery once said, writing about than at the time of a beverage that is consid- re-unification. ered to be typically Ger- man and is, of course, also popular in Berlin. Table 2: Manually evaluated examples. On the right, the table shows the simi- larity score (S.) between sentences and the prediction scores for each discourse relation (Co.=Comparison, Ct.=Contingency, E.=Expansion, T.=Temporal, N.=None) 12 Georg Rehm et al. To get a first impression on the applicability of our approach, and to moti- vate our next steps, we manually select five example sentence pairs to evaluate them qualitatively. The first line of Table 2 shows an example where the clas- sifier correctly labels the discourse relation as Temporal, most likely because of the temporal markers included. In the second line, the approach correctly iden- tifies the discourse relation as an Expansion, i. e., segment B can be seen as an extension of the biography described in segment A. Nevertheless, in other exam- ples, the approach is often unable to handle coreference. The classifier is often not detecting a discourse relation between two segments, even if those segments reference the same entity, while one segment uses a pronoun for the entity. By implementing a preprocessing step with rudimentary coreference resolution we expect the classification to improve significantly. The classifier predicts the label Comparison often when specific lexical markers, such as however, but or while, appear in segment B, like in example 3. Example 4 is an exception, where the classifier predicts the relation Comparison correctly without needing a lexical marker, but, generally we observe that this dependency on lexical features leads to wrong predictions. We see one reason in the fact that the sentences are taken from different sources, and the lexical markers for the discourse relation are therefore often missing, also even if semantically it can be seen as a Comparison. This is the case in example 5, which is wrongly predicted as an Extension while we interpret it as a Comparison (paraphrased as ’Even if he is recognized as a philosopher, he saw himself as a political activist’). On the other hand, in other examples, the lexical markers cause false positives errors. Hence, future work will extent the number of preprocessing steps to better group text segments which have the same content and talk about the same entities, events or topics. 6 Conclusions We describe first experiments in order to apply our Semantic Storytelling ap- proach to an industrial use case. This use case, “Explore the Neighbourhood!”, makes it possible to interactively create a city guide with adjusting interesting stories about a particular district built upon user-dependent parameters, such as predefined topics, keywords, etc. The basic idea is to automate storytelling by detecting discourse relations between texts segments of different sources on the same topic, which makes it possible to be able to detect and create new storylines extracted from a document collection. We describe the different steps in order to create a corresponding processing framework. In the experiment presented here, we focus on the third step of our approach, the classification of discourse rela- tions between segments. By focusing more on steps one and two as described in Section 4, we will be able to improve the results in further experiments. For ex- ample, we expect the classification to improve significantly by using coreference resolution during preprocessing. One way of improving the coreference resolution would be to pretrain the classifier on the coreference task first [10]. As data sets are still limited, we will expand the data set for our needs and create, in the longer run, annotations to develop a gold standard. Towards Discourse Parsing-inspired Semantic Storytelling 13 Acknowledgements The research presented in this article is funded by the German Federal Min- istry of Education and Research (BMBF) through the project QURATOR (Un- ternehmen Region, Wachstumskern, no. 03WKDA1A). http://qurator.ai References 1. Bal, M.: Narratology: Introduction to the Theory of Narrative. 1985. Trans. by Christine van Boheemen. Toronto: University of Toronto Press (1985) 2. Bromley, J., Bentz, J., Bottou, L., Guyon, I., Lecun, Y., Moore, C., Eduard Sackinger, Shah, R.: Signature Verification using a Siamese Time Delay Neural Network. International Journal of Pattern Recognition and Artificial Intelligence 7(4) (1993) 3. Bruner, J.: The Narrative Construction of Reality. Critical Inquiry 18(1), 1–21 (1991) 4. Carlson, L., Marcu, D., Okurowski, M.E.: RST Discourse Treebank, ldc2002t07 (2002), https://catalog.ldc.upenn.edu/LDC2002T07 5. Caselli, T., Vossen, P.: The Event StoryLine Corpus: A New Benchmark for Causal and Temporal Relation Extraction. In: Proceedings of the Events and Stories in the News Workshop. pp. 77–86. ACL (2017) 6. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: NAACL-HLT. pp. 4171–4186 (2019) 7. Fan, A., Lewis, M., Dauphin, Y.: Hierarchical Neural Story Generation. In: Pro- ceedings of the 56th Annual Meeting of the Association for Computational Lin- guistics (Volume 1: Long Papers). pp. 889–898 (2018) 8. Fan, A., Lewis, M., Dauphin, Y.: Strategies for Structuring Story Generation. arXiv preprint arXiv:1902.01109 (2019) 9. Hernault, H., Prendinger, H., duVerle, D.A., Ishizuka, M.: HILDA: A Discourse Parser Using Support Vector Machine Classification. Dialogue & Discourse 1(3), 1–33 (2010) 10. Joshi, M., Levy, O., Zettlemoyer, L., Weld, D.: BERT for Coreference Res- olution: Baselines and Analysis. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th Interna- tional Joint Conference on Natural Language Processing (EMNLP-IJCNLP). pp. 5807–5812. Association for Computational Linguistics, Hong Kong, China (Nov 2019). https://doi.org/10.18653/v1/D19-1588, https://www.aclweb.org/ anthology/D19-1588 11. Joty, S., Carenini, G., Ng, R.T.: CODRA: A Novel Discriminative Framework for Rhetorical Analysis 41-3 (2015) 12. Mann, W.C., Thompson, S.A.: Rhetorical Structure Theory: Toward a Functional Theory of Text Organization. Text 8, 243–281 (1988) 13. Marcus, M., Kim, G., Marcinkiewicz, M.A., MacIntyre, R., Bies, A., Ferguson, M., Katz, K., Schasberger, B.: The Penn Treebank: Annotating Predicate Argument Structure. In: Proceedings of the Workshop on Human Language Technology. pp. 114–119. HLT ’94, ACL (1994) 14. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Rep- resentations in Vector Space. In: 1st International Conference on Learning Repre- sentations (2013) 14 Georg Rehm et al. 15. Moreno-Schneider, J., Bourgonje, P., Rehm, G.: Towards User Interfaces for Se- mantic Storytelling. In: Yamamoto, S. (ed.) Human Interface and the Management of Information: Information, Knowledge and Interaction Design, 19th International Conference, HCI International 2017 (Vancouver, Canada). pp. 403–421. Lecture Notes in Computer Science (LNCS), Springer (2017) 16. Moreno-Schneider, J., Srivastava, A., Bourgonje, P., Wabnitz, D., Rehm, G.: Se- mantic Storytelling, Cross-lingual Event Detection and Other Semantic Services for a Newsroom Content Curation Dashboard. In: Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism. pp. 68–73. ACL (2017) 17. Nie, A., Bennett, E., Goodman, N.: DisSent: Learning Sentence Representations from Explicit Discourse Relations. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. pp. 4497–4510. ACL (2019) 18. Ostendorff, M., Bourgonje, P., Berger, M., Moreno-Schneider, J., Rehm, G., Gipp, B.: Enriching BERT with Knowledge Graph Embeddings for Document Classifi- cation. In: Proceedings of the GermEval 2019 Workshop (2019) 19. Pagliardini, M., Gupta, P., Jaggi, M.: Unsupervised Learning of Sentence Embed- dings Using Compositional n-Gram Features. In: Proceedings of the 2018 Confer- ence of the North American Chapter of the Association for Computational Lin- guistics: Human Language Technologies. pp. 528–540 (2018) 20. Prasad, R., Dinesh, N., Lee, A., Miltsakaki, E., Robaldo, L., Joshi, A., Webber, B.: The Penn Discourse Treebank 2.0. In: In Proceedings of LREC (2008) 21. Propp, V.Y.: Morphology of the Folktale. Publication ... of the Indiana University Research Center in Anthropology, Folklore, and Linguistics, University of Texas Press (1968) 22. Rehm, G., He, J., Schneider, J.M., Nehring, J., Quantz, J.: Designing User In- terfaces for Curation Technologies. In: Yamamoto, S. (ed.) Human Interface and the Management of Information: Information, Knowledge and Interaction Design, 19th International Conference, HCI International 2017 (Vancouver, Canada). pp. 388–406. Lecture Notes in Computer Science (LNCS), Springer (2017), part I 23. Rehm, G., Moreno-Schneider, J., Bourgonje, P., Srivastava, A., Fricke, R., Thom- sen, J., He, J., Quantz, J., Berger, A., König, L., Räuchle, S., Gerth, J., Wab- nitz, D.: Different Types of Automated and Semi-Automated Semantic Story- telling: Curation Technologies for Different Sectors. In: Rehm, G., Declerck, T. (eds.) Language Technologies for the Challenges of the Digital Age: 27th Interna- tional Conference, GSCL 2017, Berlin, Germany, September 13-14, 2017, Proceed- ings. pp. 232–247. Lecture Notes in Artificial Intelligence (LNAI), Gesellschaft für Sprachtechnologie und Computerlinguistik e.V., Springer (2018) 24. Rehm, G., Moreno-Schneider, J., Bourgonje, P., Srivastava, A., Nehring, J., Berger, A., König, L., Räuchle, S., Gerth, J.: Event Detection and Semantic Storytelling: Generating a Travelogue from a large Collection of Personal Letters. In: Caselli, T., Miller, B., van Erp, M., Vossen, P., Palmer, M., Hovy, E., Mitamura, T. (eds.) Proceedings of the Events and Stories in the News Workshop. pp. 42–51. ACL (2017) 25. Salton, G., McGill, M.: Introduction to Modern Information Retrieval. Computer Series, McGraw-Hill, New York (1983) 26. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. CoRR pp. 1–5 (2019) 27. Selivanov, D., Wang, Q.: text2vec: Modern Text Mining Framework for R. Com- puter software manual](R package version 0.4. 0) (2016) Towards Discourse Parsing-inspired Semantic Storytelling 15 28. Soricut, R., Marcu, D.: Sentence Level Discourse Parsing using Syntactic and Lex- ical Information. In: Proceedings of the 2003 Human Language Technology Con- ference of the North American Chapter of the Association for Computational Lin- guistics. pp. 228–235 (2003) 29. Van Dijk, T.A.: News as Discourse. Routledge (2013) 30. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention Is All You Need. Advances in Neural Information Processing Systems 30 (Nips), 5998–6008 (jun 2017) 31. Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Brew, J.: HuggingFace’s Transformers: State- of-the-art Natural Language Processing (oct 2019) 32. Yarlott, W.V., Cornelio, C., Gao, T., Finlayson, M.: Identifying the Discourse Function of News Article Paragraphs. In: Proceedings of the Workshop Events and Stories in the News 2018. pp. 25–33 (2018) 33. Yarlott, W.V.H., Finlayson, M.A.: ProppML: A Complete Annotation Scheme for Proppian Morphologies. In: CMN. OASICS, vol. 53, pp. 8:1–8:19. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2016)