Segmenting Narrative Synopses on Event Reporting Mode based on Heuristics on Constituency Parses

Segmenting Narrative Synopses on Event Reporting Mode based on Heuristics on Constituency Parses PabloGervás pgervas@ucm.es Facultad de Informática Universidad Complutense de Madrid

28040 Madrid Spain

Segmenting Narrative Synopses on Event Reporting Mode based on Heuristics on Constituency Parses 1613-0073 63828DC126D3BE5BF36620B177D556D0 GROBID - A machine learning software for extracting information from scholarly documents modes of reporting events embedded discourse plot synopses Stanford Core NLP

A narrative of relative complexity will often include events beyond simple facts that happen in the story world-such as plans, orders, wishes of the characters-or references to events that happen in a di erent story world-stories or anecdotes being told by the characters. These correspond to di erent modes of reporting events. Identifying spans of text corresponding to di erent modes of reporting is a signi cant challenge. The present paper proposes a mechanism for segmenting the text of synopses of narrative plots into spans that correspond to di erent views of the storyworld in terms of temporal, spatial or modal coordinates. This is achieved by considering syntactic structure to identify cues for the start of embedded discourses and continuity over features such as tense, voice or mode to identify the points where the embedded discourse ends. This process can handle embedded discourses that span over several sentences and recursive nesting of discourses. The solution is tested against a corpus of synopses hand-annotated for start and end of embedded discourses. Acceptable precision and recall metrics are reported.

Introduction

The events included in a narrative do not always constitute a sequence of events that are equally true in the storworld in a steady succession. Sometimes the narrative refers to events that are not held to be true-but rather inform about wishes or plans of characters-or that were true at an earlier point in time. The start of the spans of text a ected by such instances are often indicated linguistically by means of speci c syntactic constructions, such as in nitive sentences, modal verbs or reported speech constructions. The end of the span of text that is a ected may be indicated by changes in tense or voice. These phenomena constitute an important challenge for the interpretation of narrative text, because even assuming a certain ability to construct a conceptual representation of each event from the given text, the interpreter is still faced with the need to decide in what mode-as a truth or as a wish-and with what temporal and spatial coordinates to add it to the representation being constructed for the storyworld.

Examples of this type of sentence are: "the girl orders the gendarmes to destroy the garden" or "The forecaster warns the weather may be cold. " These sentences describe a principal event-an order or a warning-which involve a secondary event-destroying the garden or the coming of cold weather. In each case, the meaning of the verb includes connotations about the distance between the secondary event and the principal event. Sometimes a sentence introduces an embedded discourse and then the reporting mode introduced by that embedded discourse continues over a number of sentences. Consider the following example:

A tsar has his queen and their son Ivan. The groom predicts [a sister will be born who shall be a terrible witch. The sister shall devour her father and mother and all people under their command.] Ivan asks the tsar for permission to go out for a walk.

The second sentence introduces a prophecy. The third sentence continues describing parts of the prophecy. The fourth sentence returns to the telling of the main story, and no longer refers to the prophecy. In terms of changes in reporting mode, the span of text corresponding to the prophecy should be identi ed as the one marked between square brackets. This is the type of task we want to address. We work on the hypothesis that such changes in mode of reporting may be agged by changes in the features of the clauses that determine the time/space coordinates for the event: tense, voice, mode. In the example, the events corresponding to the prophecy are all presented in future tense.

The present paper explores the viability of developing such a computational solution for identifying spans of text corresponding to di erent modes of reporting events based on the Stanford Core NLP tool for English [1].

Previous Work

The distinction between modes of reporting for a narrative that is considered in this paper is based on pragmatic criteria: it is intended to inform a process of representation of the information contained in the narrative, and it is concerned with clustering the events reported into sets that are attributed a similar degree of certainty in the same possible world.

From the point of view of linguistics, there are a wide range of phenomena that might in uence this kind of classi cation/clustering task. The most relevant is modality, de ned as the way in which statements in a language may be marked in terms of their relation with reality or truth [2]. In the particular case of English, modality is most often expressed in terms of auxiliaries-such as may or can but may also be expressed lexically-with verbs such as want or need.

Another relevant phenomenon is reported discourse, which appears when an agent reports discourse originally contributed by a di erent agent, and which may itself be an utterance or a belief [3]. Reported discourse may appears as direct speech-the reporting agent conveys the exact words of the reported agent-or as indirect speech-the reporting agent paraphrases what the reported agent said. Direct speech is typographically marked by presenting the reported discourse between quotes. Indirect speech is usually introduced by a complementizer-in English that.

Instances of reported discourse may be further characterised by their attribution relations [4]. The attribution relation for a statement involving reported discourse that explicitly encode the reported discourse, the speaker (or source) and some cue indicating the attitude attributed to the speaker. Attributions are marked in English by the use of speci c prepositions, speci c lexical phrases, speci c reporting verbs and the verbs preceding that-clauses [5].

The work of [4] develops a system for the automatic extraction of attribution relations. The proposed system relies essentially on a k-NN classi er for identifying verbs that act as cues for attribution, a conditional random eld labeller to identify the span of text that corresponds to the reported discourse, and a logistic regression classi er to identify the entity that is presented as the source. A number of additional components help to re ne the speci c entities involved in cases where they are conveyed in the text by complex expressions spanning several tokens.

A di erent approach addresses the task of segmenting plot synopses identifying turning points in the narrative [6]. This system relies on a corpus of synopses annotated with turning points-stages in the structure of the narrative-over which a neural network model is trained to identify the turning points. One important argument presented by these authors is that working on synopses presents signi cant advantages: (1) the shorter format makes annotation easier, so the e ort is easier to scale, (2) interannotator agreement is likely to be higher for synopses, given that synopses are written at a higher level of abstraction.

When characters in a story report events by themselves telling a story, each such telling is considered to introduce a new narrative level. The task of annotating narrative levels over long texts has been addressed by [7]. The authors outline the di culties presented by a lack of corpora annotated with the relevant information and propose a solution based on extending a small annotated sample with synthetically created data. In terms of the task itself, they propose a model of the task as one of segmentation of the text by identifying boundaries of narrative levels in the text.

The procedure described in this paper relies on the constituency parse provided by the Stanford Core NLP toolkit [1]. The constituency parse for a sentence is used as the main data structure to drive the process because it respects the relative order of appearance of the words in the input text. This is relevant for the procedure employed for identifying separate spans of text that correspond to di erent modes of reporting events, as they will be assigned to continuous spans of the text.

Segmenting Narrative Text into Spans with Di erent Reporting Mode

A faithful reconstruction of the meaning of a given story requires that the di erent modes of reporting be identi ed and correctly assigned to each and every one of the events mentioned. One possible way of achieving this is by identifying the spans of text corresponding to each reporting mode and the set of events mentioned in each span.

A Corpus of Plot Synopses

Because the aim of this research initiative is to explore the application of text processing tools to the task of identifying the meaning of a narrative in terms of the storyworld it describes, it was considered more practical to operate on synopses of larger narratives rather than on the narratives themselves. This is based on the assumption that synopses attempt to condense the structure of the plot for a given narrative, leaving out the details that add value to the work but not necessarily to the general structure.

To inform this process, a small corpus of plot synopses has been considered. This corpus draws upon two di erent sources: the synopses of the set of plots for Russian folk tales originally analysed by Vladimir Propp [8] and the synopses of the set of French fairy tales by Madamme d'Aulnoy [9] as annotated in terms of Proppian character functions by [10]. This corpus has been manually annotated by hand to mark the start and end of all instances of embedded discourse which di er from other contributions to the text in the relative distance between the subject of the main clause and the reality of the embedded discourse. These distances sometimes involve modality, time or even a di erent storyworld.

The statistics of the spans of reporting mode that appear in the manual annotation are presented in Table 1 for the Propp tales and in Table 2 for the D'Aulnoy tales.

A Text Segmentation Process

The process of text segmentation needs to be informed by the following factors: the set of events reported by the text (often a single sentence in the text reports more than one event, An extraction procedure is proposed that builds a sequence of subclauses for each sentenceeach corresponding to a subtree describing an event. We develop heuristics for identifying points in the narrative where the events being told di er signi cantly in the mode in which they are reported, and consider these points as candidates for breaks between spans for di erent reporting modes.

The di erent reporting modes that we have identi ed as being marked in this way are: events that occur in signi cantly di erent temporal coordinates ( ashbacks, ashforwards), events that are narrated in a di erent modality (wishes, plans, prophecies, curses, orders), events that are speci cally marked as being narrated by particular characters (indirect speech, reported speech).

Sub-Clause Extraction

A sentence with subordinate clauses will need to be broken down into its constituent parts. The relations between these parts are captured by assigning labels to identify subordinate clauses and adding these labels in argument positions in the main clause to represent the subordination relation.

The result of this process is a basic tree that represents the main clause as a simple structure with either noun phrases or subclause identi cation nodes as arguments. An example of this process is shown in Table 3. A recursive search for subclauses is applied. Recursion allows consideration of subclauses that appear as adjuncts to noun phrases in other clauses. The type of subclauses that may appear nested within a noun phrase in the constituency trees produced by the Stanford parser are: relative clauses ("[The boy who was hungry] asked for a snack", past participle sentences ("[The horses bought at the market] proved worthless") and gerund sentences ("He found [a ring belonging to his mother]"). In each example, the part of the sentence corresponding to the subclause attached to the noun phrase has been marked in brackets. When extracting this type of subclauses, duplication of the subject may be required. An example of this process is shown in Table 4.

Identifying Embedded Discourse Boundaries

For each sentence, the set of subclauses extracted from it are ordered so that relations of embedding are presented in the same order as in the discourse: main clause followed by embedded clause. The procedure is applied in succession to the resulting sequence of subclauses.

For each clause identi ed in the manner described above, the system extracts the following information: voice, tense and main verb. The system also considers the following additional aspects that play a role on boundary identi cation: whether the clause includes a subordinated clause, whether the main verb of the clause has connotations of change in reporting mode, whether the clause includes a clause-grouping punctuation sign (opening or closing). Finally, given the recursive nature of embedded discourse, the system maintains stack data structures that allow it to keep track at any given point of: the depth of levels of quotation and the depth of levels of embedding.

At each point, a look ahead stage has been added to check whether: the following clause starts with quotation marks, or the following clause di ers in tense/voice/mode from the current one.

Table 5

Metrics on precision and recall over extracted features for the set of Propp tales. Values are shown for span starts (S), span ends (E) and specific spans as annotated in the corpus (sp). In each case, precision is indicated by a P su ix and recall by a R su ix. The heuristic that decides at each point of the discourse whether to consider the start of a span with a di erent reporting mode for events relies on the following conditions: whether the syntactic structure of the given clause includes embedded subclauses, whether the main verb of the clause is considered to change the mode of reporting-as de ned by a resource le that lists verbs of this type-and whether the clause start with opening quotation marks. The decision on the end of a span of discourse relies on the following conditions: whether the clause ahead involves changes in tense, voice and mood; whether the current clause already opened a span of reported discourse-which would make it a main clause and necessarily followed by at least one subordinate subclause-and whether closing quotation marks appear at the end of the subclause.

Discussion

The described solution can be discussed from two points of view: in terms of recall and precision metrics of the outcomes of the solution with respect to the manually annotated reference corpus and in terms of how it compares with existing solutions for similar tasks.

Metrics over Reporting Mode Spans

To provide a quantitative evaluation of the quality of the results, we consider: precision and recall over positions of embedded story starts, precision and recall over positions of embedded story ends, and precision and recall over bags of words identi ed for speci c spans as annotated in the reference corpus. The current outputs for the Propp tales are shown in Table 5 and current outputs for the D'Aulnoy tales are shown in Table 6.

Results for the two di erent data sets are shown separately due to the signi cant di erences that exist between them. The synopses for Propp tales are telegraphic transcriptions of the formal analysis described in shorthand in Propp's book. The synopses for D'Aulnoy's tales are more elaborate descriptions provided by Williams in her thesis [10]. The di erences in the complexity of the language employed give rise to the di erences in scores. Nevertheless, results are comparable across the two sets.

Reasonably high (0.84 and 0.90) precision on starts means that most of the starts of embedded stories found by the parse are indeed starts that appear marked in the reference. The lower values (0.62 and 0.40) for recall on starts means that a certain proportion of the starts marked are not being located correctly. This suggests that there are further features that identify the start of these spans, beyond verbs that indicate changes in reporting mode, that would need to be considered. Reasonably high (0.84 and 0.85) precision on ends means that most of the ends of embedded stories found by the parse are indeed ends that appear marked in the reference. Low (0.62 and 0.37) recall on ends means that a large proportion of the ends marked are not being located correctly. However, it is important to note that, because the procedure treats the location of starts and ends separately, the values for location of ends will be considerably a ected by any errors in the identi cation of the corresponding starts. Relatively high values on precision (0.95 and 0.80) and recall (0.92 and 0.89) for the identi cation of words belonging to speci c spans suggest that, once the values for locating starts and ends are improved, the procedure would work well.

It is also important to consider that some of the incorrect diagnoses leading to low values in precision and recall may be tracked back to error rates in the parser. The results of the parser chosen for the task very often misrepresent the relations between the constituents of the sentence. For instance, the parses of sentences such as "the girl orders the gendarmes to destroy the garden" consider that "the gendarmes" as subject of the clause "to destroy" and not as direct object of the clause "orders". These may explain why precision and recall for retrieving words for a speci c span do not reach maximal values. A solution enhanced with the results of a dependency parse will be considered as further work and should help improve results.

Comparison with Prior Work

The fundamental challenge for the development of adequate machine learning solutions to tasks of segmenting/classifying reported discourse in text has been identi ed as the lack of training resources of appropriate size and coverage, both in the context of attribution relations [4] and narrative levels [7]. The corpus developed by [4] constitutes a valuable resources for the speci c task of identifying attribution relations, but it does not provide coverage for phenomena beyond that. The work reported in [7] proposes a solution that relies on the generation of synthetic data to augment automatically a small data set used as seed. The resulting data set is valuable for the annotation of narrative levels but does not consider other phenomena.

As long as there are no data sets large enough to train machine learning solutions to the task, the proposed solution may provide a simple baseline that obtains results with acceptable values of precision and recall over basic features. Whereas the segmentations generated are clearly far from perfect, they can provide a valuable starting point for any e orts on developing corpora with the assistance of human annotators. At worst, they can be considered as a way to help create the seed data sets required for automatic augmentation via data synthesis.

Conclusions

A procedure for detecting spans of text that correspond to di erent modes of reporting events has been proposed. The proposed procedure operates directly on a syntactic parse of a story, without requiring any special depths of semantic analysis. It is also designed to operate on inputs based on their syntactic structure and a lexical resource that identi es verbs which indicate a change in reporting mode. These characteristics allow it to operate as a baseline solution for obtaining an initial segmentation in the absence of su cient volumes of annotated data to train machine learning solutions.

Table 11Statistics of reporting mode spans in the manual annotation of the corpus of Propp tales.Total number of sentences:344Total number of nestings:66Maximum depth of nesting:3Average depth of nesting:1.05Total number of words:2607Total number of words in embedded levels:401Percentage of the total number of words that appeared15/100embedded:

Table 22Statistics of reporting mode spans in the manual annotation of the corpus of Madamme d'Aulnoy tales.Total number of sentences:803Total number of nestings:178Maximum depth of nesting:4Average depth of nesting:1.11Total number of words:5390Total number of words in embedded levels:867Percentage of the total number of words that appeared16/100embedded:

Table 33Example of input sentence and breakdown into subclauses, showing relative order of presentation and relative co-indexing.Original parseSubclauses-Sclause2clause3NPDT/ANN/nightingale VP VBZ/foretells SBAR S NP DT/the NN/son VP MD/will VP VB/humiliateSNP VPDT/A NN/nightingale VBZ/foretells SBAR SUBCLAUSE/clause3SNP DT/the NN/son VP MD/will VP VB/humiliate NP DT/the NNS/parentsNPDT/theNNS/parents

each represented by a particular subclause), the syntactic structure of sentences whose verbs indicate a speci c reporting mode (such sentences are likely to imply a change in reporting mode between the event reported by the main clause and any reported in clauses subordinate to the main verb) and punctuation marks used to indicate relative groupings of subclauses (commas, brackets, or quotation marks).

Table 44Example of subclause breakdown for past participle sentence.Original parseSubclauses-Sclause5clause6NPNPCD/two NNS/horses VBN/bought PP IN/at NP DT/the NN/market VBD/had VP VP VP VBN/provedSNP VPNP CD/two NNS/horses VBN/bought PP IN/at NP DT/the NN/marketSNP NP VP VBD/had CD/two NNS/horses VP VBN/proved S ADJP JJ/worthlessSADJPJJ/worthless

Table 66Metrics on precision and recall over extracted features for the set of D'Aulnoy tales. Values are shown for span starts (S), span ends (E) and specific spans as annotated in the corpus (sp). In each case, precision is indicated by a P su ix and recall by a R su ix.SPSREPERspP spR0.84 0.62 0.84 0.62 0.95 0.92SPSREPERspP spR0.90 0.40 0.85 0.37 0.80 0.89

Acknowledgments

This paper has been partially supported by the CANTOR project (PID2019-108927RB-I00) funded by the Spanish Ministry of Science and Innovation.

The Stanford CoreNLP Natural Language Processing Toolkit CDManning MSurdeanu JBauer JRFinkel SBethard DMcclosky Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014 the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014

Baltimore, MD, USA

June 22-27, 2014. 2014 , System Demonstrations, The Association for Computer Linguistics Modality, Modality PPortner 2009 OUP Oxford DYOshima Perspectives in reported discourse 2006 Stanford University Stanford Ph.D. thesis SPareti Attribution: a computational approach 2015 The University of Edinburgh Ph.D. thesis Markers of attribution in english and italian opinion articles: A comparative corpus-based study ACMurphy ICAME journal 29 2005 PPapalampidi FKeller MLapata arXiv:1908.10328 Movie plot analysis via turning point identi cation 2019 arXiv preprint Exploring text recombination for automatic narrative level detection NReiter JSieker SGuhr EGius SZarrieß Proceedings of the Thirteenth Language Resources and Evaluation Conference, European Language Resources Association the Thirteenth Language Resources and Evaluation Conference, European Language Resources Association

Marseille, France

2022 Morphology of the folktale VIPropp 1968 University of Texas Press MAulnoy ; LDuprat-Duverger Contes des fées 1866 EDWilliams The Fairy Tales by Madamme d'Aulnoy

Houston, Texas

1982 Rice University Ph.D. thesis