Segmenting Narrative Synopses on Event Reporting
Mode based on Heuristics on Constituency Parses
Pablo Gervás1
Facultad de Informática, Universidad Complutense de Madrid, Madrid, 28040 Spain


                                      Abstract
                                      A narrative of relative complexity will often include events beyond simple facts that happen in the
                                      story world–such as plans, orders, wishes of the characters– or references to events that happen in a
                                      different story world–stories or anecdotes being told by the characters. These correspond to different
                                      modes of reporting events. Identifying spans of text corresponding to different modes of reporting is
                                      a significant challenge. The present paper proposes a mechanism for segmenting the text of synopses
                                      of narrative plots into spans that correspond to different views of the storyworld in terms of temporal,
                                      spatial or modal coordinates. This is achieved by considering syntactic structure to identify cues for the
                                      start of embedded discourses and continuity over features such as tense, voice or mode to identify the
                                      points where the embedded discourse ends. This process can handle embedded discourses that span over
                                      several sentences and recursive nesting of discourses. The solution is tested against a corpus of synopses
                                      hand-annotated for start and end of embedded discourses. Acceptable precision and recall metrics are
                                      reported.

                                      Keywords
                                      modes of reporting events, embedded discourse, plot synopses, Stanford Core NLP


1. Introduction
The events included in a narrative do not always constitute a sequence of events that are equally
true in the storworld in a steady succession. Sometimes the narrative refers to events that are
not held to be true–but rather inform about wishes or plans of characters–or that were true
at an earlier point in time. The start of the spans of text affected by such instances are often
indicated linguistically by means of specific syntactic constructions, such as infinitive sentences,
modal verbs or reported speech constructions. The end of the span of text that is affected may
be indicated by changes in tense or voice. These phenomena constitute an important challenge
for the interpretation of narrative text, because even assuming a certain ability to construct a
conceptual representation of each event from the given text, the interpreter is still faced with
the need to decide in what mode–as a truth or as a wish–and with what temporal and spatial
coordinates to add it to the representation being constructed for the storyworld.


In: R. Campos, A. Jorge, A. Jatowt, S. Bhatia, M. Litvak (eds.): Proceedings of the Text2Story’23 Workshop, Dublin
(Republic of Ireland), 2-April-2023.
⇤
  Corresponding author.
� pgervas@ucm.es (P. Gervás)
� http://nil.fdi.ucm.es// (P. Gervás)
� 0000-0003-4906-9837 (P. Gervás)
                                    © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                       73
  Examples of this type of sentence are: “the girl orders the gendarmes to destroy the garden” or
“The forecaster warns the weather may be cold.” These sentences describe a principal event–an
order or a warning–which involve a secondary event–destroying the garden or the coming of
cold weather. In each case, the meaning of the verb includes connotations about the distance
between the secondary event and the principal event. Sometimes a sentence introduces an
embedded discourse and then the reporting mode introduced by that embedded discourse
continues over a number of sentences. Consider the following example:

         A tsar has his queen and their son Ivan. The groom predicts [a sister will be born
      who shall be a terrible witch. The sister shall devour her father and mother and
      all people under their command.] Ivan asks the tsar for permission to go out for a
      walk.

   The second sentence introduces a prophecy. The third sentence continues describing parts
of the prophecy. The fourth sentence returns to the telling of the main story, and no longer
refers to the prophecy. In terms of changes in reporting mode, the span of text corresponding
to the prophecy should be identified as the one marked between square brackets. This is the
type of task we want to address. We work on the hypothesis that such changes in mode of
reporting may be flagged by changes in the features of the clauses that determine the time/space
coordinates for the event: tense, voice, mode. In the example, the events corresponding to the
prophecy are all presented in future tense.
   The present paper explores the viability of developing such a computational solution for
identifying spans of text corresponding to different modes of reporting events based on the
Stanford Core NLP tool for English [1].


2. Previous Work
The distinction between modes of reporting for a narrative that is considered in this paper
is based on pragmatic criteria: it is intended to inform a process of representation of the
information contained in the narrative, and it is concerned with clustering the events reported
into sets that are attributed a similar degree of certainty in the same possible world.
   From the point of view of linguistics, there are a wide range of phenomena that might
influence this kind of classification/clustering task. The most relevant is modality, defined as
the way in which statements in a language may be marked in terms of their relation with reality
or truth [2]. In the particular case of English, modality is most often expressed in terms of
auxiliaries–such as may or can but may also be expressed lexically–with verbs such as want or
need.
   Another relevant phenomenon is reported discourse, which appears when an agent reports
discourse originally contributed by a different agent, and which may itself be an utterance or a
belief [3]. Reported discourse may appears as direct speech–the reporting agent conveys the
exact words of the reported agent–or as indirect speech–the reporting agent paraphrases what
the reported agent said. Direct speech is typographically marked by presenting the reported
discourse between quotes. Indirect speech is usually introduced by a complementizer– in
English that.


                                               74
   Instances of reported discourse may be further characterised by their attribution relations
[4]. The attribution relation for a statement involving reported discourse that explicitly encode
the reported discourse, the speaker (or source) and some cue indicating the attitude attributed
to the speaker. Attributions are marked in English by the use of specific prepositions, specific
lexical phrases, specific reporting verbs and the verbs preceding that-clauses [5].
   The work of [4] develops a system for the automatic extraction of attribution relations. The
proposed system relies essentially on a k-NN classifier for identifying verbs that act as cues for
attribution, a conditional random field labeller to identify the span of text that corresponds to
the reported discourse, and a logistic regression classifier to identify the entity that is presented
as the source. A number of additional components help to refine the specific entities involved
in cases where they are conveyed in the text by complex expressions spanning several tokens.
   A different approach addresses the task of segmenting plot synopses identifying turning
points in the narrative [6]. This system relies on a corpus of synopses annotated with turning
points–stages in the structure of the narrative–over which a neural network model is trained to
identify the turning points. One important argument presented by these authors is that working
on synopses presents significant advantages: (1) the shorter format makes annotation easier,
so the effort is easier to scale, (2) interannotator agreement is likely to be higher for synopses,
given that synopses are written at a higher level of abstraction.
   When characters in a story report events by themselves telling a story, each such telling
is considered to introduce a new narrative level. The task of annotating narrative levels over
long texts has been addressed by [7]. The authors outline the difficulties presented by a lack of
corpora annotated with the relevant information and propose a solution based on extending a
small annotated sample with synthetically created data. In terms of the task itself, they propose
a model of the task as one of segmentation of the text by identifying boundaries of narrative
levels in the text.
   The procedure described in this paper relies on the constituency parse provided by the
Stanford Core NLP toolkit [1]. The constituency parse for a sentence is used as the main data
structure to drive the process because it respects the relative order of appearance of the words in
the input text. This is relevant for the procedure employed for identifying separate spans of text
that correspond to different modes of reporting events, as they will be assigned to continuous
spans of the text.


3. Segmenting Narrative Text into Spans with Different
   Reporting Mode
A faithful reconstruction of the meaning of a given story requires that the different modes of
reporting be identified and correctly assigned to each and every one of the events mentioned.
One possible way of achieving this is by identifying the spans of text corresponding to each
reporting mode and the set of events mentioned in each span.


                                                 75
Table 1
Statistics of reporting mode spans in the manual annotation of the corpus of Propp tales.
                   Total number of sentences:                                 344
                   Total number of nestings:                                    66
                   Maximum depth of nesting:                                     3
                   Average depth of nesting:                                  1.05
                   Total number of words:                                    2607
                   Total number of words in embedded levels:                  401
                   Percentage of the total number of words that appeared   15/100
                   embedded:

Table 2
Statistics of reporting mode spans in the manual annotation of the corpus of Madamme d’Aulnoy tales.
                   Total number of sentences:                                 803
                   Total number of nestings:                                  178
                   Maximum depth of nesting:                                     4
                   Average depth of nesting:                                  1.11
                   Total number of words:                                    5390
                   Total number of words in embedded levels:                  867
                   Percentage of the total number of words that appeared   16/100
                   embedded:


3.1. A Corpus of Plot Synopses
Because the aim of this research initiative is to explore the application of text processing tools
to the task of identifying the meaning of a narrative in terms of the storyworld it describes, it
was considered more practical to operate on synopses of larger narratives rather than on the
narratives themselves. This is based on the assumption that synopses attempt to condense the
structure of the plot for a given narrative, leaving out the details that add value to the work but
not necessarily to the general structure.
   To inform this process, a small corpus of plot synopses has been considered. This corpus
draws upon two different sources: the synopses of the set of plots for Russian folk tales originally
analysed by Vladimir Propp [8] and the synopses of the set of French fairy tales by Madamme
d’Aulnoy [9] as annotated in terms of Proppian character functions by [10]. This corpus has
been manually annotated by hand to mark the start and end of all instances of embedded
discourse which differ from other contributions to the text in the relative distance between the
subject of the main clause and the reality of the embedded discourse. These distances sometimes
involve modality, time or even a different storyworld.
   The statistics of the spans of reporting mode that appear in the manual annotation are
presented in Table 1 for the Propp tales and in Table 2 for the D’Aulnoy tales.

3.2. A Text Segmentation Process
The process of text segmentation needs to be informed by the following factors: the set of
events reported by the text (often a single sentence in the text reports more than one event,


                                                 76
Table 3
Example of input sentence and breakdown into subclauses, showing relative order of presentation and
relative co-indexing.
       Original parse                  Subclauses
       -                               clause2                        clause3
       S
           NP
             DT/A
             NN/nightingale
                                                                      S
           VP
                                       S                                  NP
             VBZ/foretells
                                           NP                               DT/the
             SBAR
                                                DT/A                        NN/son
               S
                                                NN/nightingale            VP
                  NP
                                           VP                               MD/will
                     DT/the
                                                VBZ/foretells               VP
                     NN/son
                                                SBAR                           VB/humiliate
                  VP
                                                  SUBCLAUSE/clause3            NP
                     MD/will
                                                                                 DT/the
                     VP
                                                                                 NNS/parents
                       VB/humiliate
                       NP
                          DT/the
                         NNS/parents


each represented by a particular subclause), the syntactic structure of sentences whose verbs
indicate a specific reporting mode (such sentences are likely to imply a change in reporting
mode between the event reported by the main clause and any reported in clauses subordinate
to the main verb) and punctuation marks used to indicate relative groupings of subclauses
(commas, brackets, or quotation marks).
   An extraction procedure is proposed that builds a sequence of subclauses for each sentence–
each corresponding to a subtree describing an event. We develop heuristics for identifying
points in the narrative where the events being told differ significantly in the mode in which they
are reported, and consider these points as candidates for breaks between spans for different
reporting modes.
   The different reporting modes that we have identified as being marked in this way are: events
that occur in significantly different temporal coordinates (flashbacks, flashforwards), events
that are narrated in a different modality (wishes, plans, prophecies, curses, orders), events that
are specifically marked as being narrated by particular characters (indirect speech, reported
speech).

3.2.1. Sub-Clause Extraction
A sentence with subordinate clauses will need to be broken down into its constituent parts. The
relations between these parts are captured by assigning labels to identify subordinate clauses
and adding these labels in argument positions in the main clause to represent the subordination
relation.
   The result of this process is a basic tree that represents the main clause as a simple structure
with either noun phrases or subclause identification nodes as arguments. An example of this
process is shown in Table 3.


                                                       77
Table 4
Example of subclause breakdown for past participle sentence.
       Original parse               Subclauses
       -                            clause5                    clause6
       S
           NP
             NP
                  CD/two
                                    S                          S
                  NNS/horses
                                        NP                         NP
             VP
                                             NP                      NP
                  VBN/bought
                                               CD/two                     CD/two
                  PP
                                               NNS/horses                 NNS/horses
                    IN/at
                                        VP                         VP
                    NP
                                             VBN/bought              VBD/had
                       DT/the
                                             PP                      VP
                       NN/market
                                               IN/at                    VBN/proved
           VP
                                               NP                       S
             VBD/had
                                                  DT/the                  ADJP
             VP
                                                  NN/market                 JJ/worthless
                VBN/proved
                S
                  ADJP
                    JJ/worthless


   A recursive search for subclauses is applied. Recursion allows consideration of subclauses
that appear as adjuncts to noun phrases in other clauses. The type of subclauses that may
appear nested within a noun phrase in the constituency trees produced by the Stanford parser
are: relative clauses (“[The boy who was hungry] asked for a snack”, past participle sentences
(“[The horses bought at the market] proved worthless”) and gerund sentences (“He found [a
ring belonging to his mother]”). In each example, the part of the sentence corresponding to the
subclause attached to the noun phrase has been marked in brackets. When extracting this type
of subclauses, duplication of the subject may be required. An example of this process is shown
in Table 4.

3.3. Identifying Embedded Discourse Boundaries
For each sentence, the set of subclauses extracted from it are ordered so that relations of
embedding are presented in the same order as in the discourse: main clause followed by
embedded clause. The procedure is applied in succession to the resulting sequence of subclauses.
   For each clause identified in the manner described above, the system extracts the following
information: voice, tense and main verb. The system also considers the following additional
aspects that play a role on boundary identification: whether the clause includes a subordinated
clause, whether the main verb of the clause has connotations of change in reporting mode,
whether the clause includes a clause-grouping punctuation sign (opening or closing). Finally,
given the recursive nature of embedded discourse, the system maintains stack data structures
that allow it to keep track at any given point of: the depth of levels of quotation and the depth
of levels of embedding.
   At each point, a look ahead stage has been added to check whether: the following clause
starts with quotation marks, or the following clause differs in tense/voice/mode from the current
one.


                                                     78
Table 5
Metrics on precision and recall over extracted features for the set of Propp tales. Values are shown for
span starts (S), span ends (E) and specific spans as annotated in the corpus (sp). In each case, precision
is indicated by a P suffix and recall by a R suffix.
                                 SP     SR     EP      ER     spP    spR
                                 0.84   0.62   0.84    0.62   0.95   0.92

Table 6
Metrics on precision and recall over extracted features for the set of D’Aulnoy tales. Values are shown for
span starts (S), span ends (E) and specific spans as annotated in the corpus (sp). In each case, precision
is indicated by a P suffix and recall by a R suffix.
                                 SP     SR     EP      ER     spP    spR
                                 0.90   0.40   0.85    0.37   0.80   0.89


   The heuristic that decides at each point of the discourse whether to consider the start of a
span with a different reporting mode for events relies on the following conditions: whether
the syntactic structure of the given clause includes embedded subclauses, whether the main
verb of the clause is considered to change the mode of reporting–as defined by a resource file
that lists verbs of this type–and whether the clause start with opening quotation marks. The
decision on the end of a span of discourse relies on the following conditions: whether the clause
ahead involves changes in tense, voice and mood; whether the current clause already opened a
span of reported discourse–which would make it a main clause and necessarily followed by at
least one subordinate subclause– and whether closing quotation marks appear at the end of the
subclause.


4. Discussion
The described solution can be discussed from two points of view: in terms of recall and precision
metrics of the outcomes of the solution with respect to the manually annotated reference corpus
and in terms of how it compares with existing solutions for similar tasks.

4.1. Metrics over Reporting Mode Spans
To provide a quantitative evaluation of the quality of the results, we consider: precision and
recall over positions of embedded story starts, precision and recall over positions of embedded
story ends, and precision and recall over bags of words identified for specific spans as annotated
in the reference corpus. The current outputs for the Propp tales are shown in Table 5 and
current outputs for the D’Aulnoy tales are shown in Table 6.
   Results for the two different data sets are shown separately due to the significant differences
that exist between them. The synopses for Propp tales are telegraphic transcriptions of the
formal analysis described in shorthand in Propp’s book. The synopses for D’Aulnoy’s tales
are more elaborate descriptions provided by Williams in her thesis [10]. The differences in the
complexity of the language employed give rise to the differences in scores. Nevertheless, results


                                                      79
are comparable across the two sets.
   Reasonably high (0.84 and 0.90) precision on starts means that most of the starts of embedded
stories found by the parse are indeed starts that appear marked in the reference. The lower
values (0.62 and 0.40) for recall on starts means that a certain proportion of the starts marked
are not being located correctly. This suggests that there are further features that identify the
start of these spans, beyond verbs that indicate changes in reporting mode, that would need to
be considered. Reasonably high (0.84 and 0.85) precision on ends means that most of the ends
of embedded stories found by the parse are indeed ends that appear marked in the reference.
Low (0.62 and 0.37) recall on ends means that a large proportion of the ends marked are not
being located correctly. However, it is important to note that, because the procedure treats
the location of starts and ends separately, the values for location of ends will be considerably
affected by any errors in the identification of the corresponding starts. Relatively high values
on precision (0.95 and 0.80) and recall (0.92 and 0.89) for the identification of words belonging
to specific spans suggest that, once the values for locating starts and ends are improved, the
procedure would work well.
   It is also important to consider that some of the incorrect diagnoses leading to low values
in precision and recall may be tracked back to error rates in the parser. The results of the
parser chosen for the task very often misrepresent the relations between the constituents of
the sentence. For instance, the parses of sentences such as “the girl orders the gendarmes to
destroy the garden” consider that “the gendarmes” as subject of the clause “to destroy” and not
as direct object of the clause “orders”. These may explain why precision and recall for retrieving
words for a specific span do not reach maximal values. A solution enhanced with the results of
a dependency parse will be considered as further work and should help improve results.

4.2. Comparison with Prior Work
The fundamental challenge for the development of adequate machine learning solutions to tasks
of segmenting/classifying reported discourse in text has been identified as the lack of training
resources of appropriate size and coverage, both in the context of attribution relations [4] and
narrative levels [7]. The corpus developed by [4] constitutes a valuable resources for the specific
task of identifying attribution relations, but it does not provide coverage for phenomena beyond
that. The work reported in [7] proposes a solution that relies on the generation of synthetic
data to augment automatically a small data set used as seed. The resulting data set is valuable
for the annotation of narrative levels but does not consider other phenomena.
   As long as there are no data sets large enough to train machine learning solutions to the task,
the proposed solution may provide a simple baseline that obtains results with acceptable values
of precision and recall over basic features. Whereas the segmentations generated are clearly far
from perfect, they can provide a valuable starting point for any efforts on developing corpora
with the assistance of human annotators. At worst, they can be considered as a way to help
create the seed data sets required for automatic augmentation via data synthesis.


                                                80
5. Conclusions
A procedure for detecting spans of text that correspond to different modes of reporting events
has been proposed. The proposed procedure operates directly on a syntactic parse of a story,
without requiring any special depths of semantic analysis. It is also designed to operate on
inputs based on their syntactic structure and a lexical resource that identifies verbs which
indicate a change in reporting mode. These characteristics allow it to operate as a baseline
solution for obtaining an initial segmentation in the absence of sufficient volumes of annotated
data to train machine learning solutions.


Acknowledgments
This paper has been partially supported by the CANTOR project (PID2019-108927RB-I00) funded
by the Spanish Ministry of Science and Innovation.


References
 [1] C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, D. McClosky, The Stanford
     CoreNLP Natural Language Processing Toolkit., in: Proceedings of the 52nd Annual
     Meeting of the Association for Computational Linguistics, ACL 2014, June 22-27, 2014,
     Baltimore, MD, USA, System Demonstrations, The Association for Computer Linguistics,
     2014, pp. 55–60.
 [2] P. Portner, Modality, Modality, OUP Oxford, 2009.
 [3] D. Y. Oshima, Perspectives in reported discourse, Ph.D. thesis, Stanford University Stanford,
     2006.
 [4] S. Pareti, Attribution: a computational approach, Ph.D. thesis, The University of Edinburgh,
     2015.
 [5] A. C. Murphy, Markers of attribution in english and italian opinion articles: A comparative
     corpus-based study, ICAME journal 29 (2005) 131–150.
 [6] P. Papalampidi, F. Keller, M. Lapata, Movie plot analysis via turning point identification,
     arXiv preprint arXiv:1908.10328 (2019).
 [7] N. Reiter, J. Sieker, S. Guhr, E. Gius, S. Zarrieß, Exploring text recombination for automatic
     narrative level detection, in: Proceedings of the Thirteenth Language Resources and
     Evaluation Conference, European Language Resources Association, Marseille, France,
     2022, pp. 3346–3353.
 [8] V. I. Propp, Morphology of the folktale, University of Texas Press, 1968.
 [9] M. d’Aulnoy, Contes des fées, L. Duprat-Duverger, 1866.
[10] E. D. Williams, The Fairy Tales by Madamme d’Aulnoy, Ph.D. thesis, Rice University,
     Houston, Texas, 1982.


                                                81