<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Segmenting Narrative Synopses on Event Reporting Mode based on Heuristics on Constituency Parses</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pablo Gervás</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Facultad de Informática, Universidad Complutense de Madrid</institution>
          ,
          <addr-line>Madrid, 28040</addr-line>
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A narrative of relative complexity will often include events beyond simple facts that happen in the story world-such as plans, orders, wishes of the characters- or references to events that happen in a di erent story world-stories or anecdotes being told by the characters. These correspond to di erent modes of reporting events. Identifying spans of text corresponding to di erent modes of reporting is a signi cant challenge. The present paper proposes a mechanism for segmenting the text of synopses of narrative plots into spans that correspond to di erent views of the storyworld in terms of temporal, spatial or modal coordinates. This is achieved by considering syntactic structure to identify cues for the start of embedded discourses and continuity over features such as tense, voice or mode to identify the points where the embedded discourse ends. This process can handle embedded discourses that span over several sentences and recursive nesting of discourses. The solution is tested against a corpus of synopses hand-annotated for start and end of embedded discourses. Acceptable precision and recall metrics are reported.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;modes of reporting events</kwd>
        <kwd>embedded discourse</kwd>
        <kwd>plot synopses</kwd>
        <kwd>Stanford Core NLP</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The events included in a narrative do not always constitute a sequence of events that are equally
true in the storworld in a steady succession. Sometimes the narrative refers to events that are
not held to be true–but rather inform about wishes or plans of characters–or that were true
at an earlier point in time. The start of the spans of text a ected by such instances are often
indicated linguistically by means of speci c syntactic constructions, such as in nitive sentences,
modal verbs or reported speech constructions. The end of the span of text that is a ected may
be indicated by changes in tense or voice. These phenomena constitute an important challenge
for the interpretation of narrative text, because even assuming a certain ability to construct a
conceptual representation of each event from the given text, the interpreter is still faced with
the need to decide in what mode–as a truth or as a wish–and with what temporal and spatial
coordinates to add it to the representation being constructed for the storyworld.</p>
      <p>Examples of this type of sentence are: “the girl orders the gendarmes to destroy the garden” or
“The forecaster warns the weather may be cold.” These sentences describe a principal event–an
order or a warning–which involve a secondary event–destroying the garden or the coming of
cold weather. In each case, the meaning of the verb includes connotations about the distance
between the secondary event and the principal event. Sometimes a sentence introduces an
embedded discourse and then the reporting mode introduced by that embedded discourse
continues over a number of sentences. Consider the following example:</p>
      <p>A tsar has his queen and their son Ivan. The groom predicts [a sister will be born
who shall be a terrible witch. The sister shall devour her father and mother and
all people under their command.] Ivan asks the tsar for permission to go out for a
walk.</p>
      <p>The second sentence introduces a prophecy. The third sentence continues describing parts
of the prophecy. The fourth sentence returns to the telling of the main story, and no longer
refers to the prophecy. In terms of changes in reporting mode, the span of text corresponding
to the prophecy should be identi ed as the one marked between square brackets. This is the
type of task we want to address. We work on the hypothesis that such changes in mode of
reporting may be agged by changes in the features of the clauses that determine the time/space
coordinates for the event: tense, voice, mode. In the example, the events corresponding to the
prophecy are all presented in future tense.</p>
      <p>
        The present paper explores the viability of developing such a computational solution for
identifying spans of text corresponding to di erent modes of reporting events based on the
Stanford Core NLP tool for English [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Previous Work</title>
      <p>The distinction between modes of reporting for a narrative that is considered in this paper
is based on pragmatic criteria: it is intended to inform a process of representation of the
information contained in the narrative, and it is concerned with clustering the events reported
into sets that are attributed a similar degree of certainty in the same possible world.</p>
      <p>
        From the point of view of linguistics, there are a wide range of phenomena that might
in uence this kind of classi cation/clustering task. The most relevant ismodality, de ned as
the way in which statements in a language may be marked in terms of their relation with reality
or truth [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In the particular case of English, modality is most often expressed in terms of
auxiliaries–such as may or can but may also be expressed lexically–with verbs such as want or
need.
      </p>
      <p>
        Another relevant phenomenon is reported discourse, which appears when an agent reports
discourse originally contributed by a di erent agent, and which may itself be an utterance or a
belief [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Reported discourse may appears as direct speech–the reporting agent conveys the
exact words of the reported agent–or as indirect speech–the reporting agent paraphrases what
the reported agent said. Direct speech is typographically marked by presenting the reported
discourse between quotes. Indirect speech is usually introduced by a complementizer– in
English that.
      </p>
      <p>
        Instances of reported discourse may be further characterised by their attribution relations
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The attribution relation for a statement involving reported discourse that explicitly encode
the reported discourse, the speaker (or source) and some cue indicating the attitude attributed
to the speaker. Attributions are marked in English by the use of speci c prepositions, speci c
lexical phrases, speci c reporting verbs and the verbs precedingthat-clauses [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        The work of [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] develops a system for the automatic extraction of attribution relations. The
proposed system relies essentially on a k-NN classi er for identifying verbs that act as cues for
attribution, a conditional random eld labeller to identify the span of text that corresponds to
the reported discourse, and a logistic regression classi er to identify the entity that is presented
as the source. A number of additional components help to re ne the speci c entities involved
in cases where they are conveyed in the text by complex expressions spanning several tokens.
      </p>
      <p>
        A di erent approach addresses the task of segmenting plot synopses identifying turning
points in the narrative [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This system relies on a corpus of synopses annotated with turning
points–stages in the structure of the narrative–over which a neural network model is trained to
identify the turning points. One important argument presented by these authors is that working
on synopses presents signi cant advantages: (1) the shorter format makes annotation easier,
so the e ort is easier to scale, (2) interannotator agreement is likely to be higher for synopses,
given that synopses are written at a higher level of abstraction.
      </p>
      <p>
        When characters in a story report events by themselves telling a story, each such telling
is considered to introduce a new narrative level. The task of annotating narrative levels over
long texts has been addressed by [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The authors outline the di culties presented by a lack of
corpora annotated with the relevant information and propose a solution based on extending a
small annotated sample with synthetically created data. In terms of the task itself, they propose
a model of the task as one of segmentation of the text by identifying boundaries of narrative
levels in the text.
      </p>
      <p>
        The procedure described in this paper relies on the constituency parse provided by the
Stanford Core NLP toolkit [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The constituency parse for a sentence is used as the main data
structure to drive the process because it respects the relative order of appearance of the words in
the input text. This is relevant for the procedure employed for identifying separate spans of text
that correspond to di erent modes of reporting events, as they will be assigned to continuous
spans of the text.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Segmenting Narrative Text into Spans with Di erent</title>
    </sec>
    <sec id="sec-4">
      <title>Reporting Mode</title>
      <p>A faithful reconstruction of the meaning of a given story requires that the di erent modes of
reporting be identi ed and correctly assigned to each and every one of the events mentioned.
One possible way of achieving this is by identifying the spans of text corresponding to each
reporting mode and the set of events mentioned in each span.</p>
      <sec id="sec-4-1">
        <title>3.1. A Corpus of Plot Synopses</title>
        <p>Because the aim of this research initiative is to explore the application of text processing tools
to the task of identifying the meaning of a narrative in terms of the storyworld it describes, it
was considered more practical to operate on synopses of larger narratives rather than on the
narratives themselves. This is based on the assumption that synopses attempt to condense the
structure of the plot for a given narrative, leaving out the details that add value to the work but
not necessarily to the general structure.</p>
        <p>
          To inform this process, a small corpus of plot synopses has been considered. This corpus
draws upon two di erent sources: the synopses of the set of plots for Russian folk tales originally
analysed by Vladimir Propp [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and the synopses of the set of French fairy tales by Madamme
d’Aulnoy [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] as annotated in terms of Proppian character functions by [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. This corpus has
been manually annotated by hand to mark the start and end of all instances of embedded
discourse which di er from other contributions to the text in the relative distance between the
subject of the main clause and the reality of the embedded discourse. These distances sometimes
involve modality, time or even a di erent storyworld.
        </p>
        <p>The statistics of the spans of reporting mode that appear in the manual annotation are
presented in Table 1 for the Propp tales and in Table 2 for the D’Aulnoy tales.</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. A Text Segmentation Process</title>
        <p>The process of text segmentation needs to be informed by the following factors: the set of
events reported by the text (often a single sentence in the text reports more than one event,
each represented by a particular subclause), the syntactic structure of sentences whose verbs
indicate a speci c reporting mode (such sentences are likely to imply a change in reporting
mode between the event reported by the main clause and any reported in clauses subordinate
to the main verb) and punctuation marks used to indicate relative groupings of subclauses
(commas, brackets, or quotation marks).</p>
        <p>An extraction procedure is proposed that builds a sequence of subclauses for each sentence–
each corresponding to a subtree describing an event. We develop heuristics for identifying
points in the narrative where the events being told di er signi cantly in the mode in which they
are reported, and consider these points as candidates for breaks between spans for di erent
reporting modes.</p>
        <p>The di erent reporting modes that we have identi ed as being marked in this way are: events
that occur in signi cantly di erentemporal coordinates ( ashbacks, ashforwards), events
that are narrated in a di erentmodality (wishes, plans, prophecies, curses, orders), events that
are speci cally marked as beingnarrated by particular characters (indirect speech, reported
speech).
3.2.1. Sub-Clause Extraction
A sentence with subordinate clauses will need to be broken down into its constituent parts. The
relations between these parts are captured by assigning labels to identify subordinate clauses
and adding these labels in argument positions in the main clause to represent the subordination
relation.</p>
        <p>The result of this process is a basic tree that represents the main clause as a simple structure
with either noun phrases or subclause identi cation nodes as arguments. An example of this
process is shown in Table 3.</p>
        <p>A recursive search for subclauses is applied. Recursion allows consideration of subclauses
that appear as adjuncts to noun phrases in other clauses. The type of subclauses that may
appear nested within a noun phrase in the constituency trees produced by the Stanford parser
are: relative clauses (“[The boy who was hungry] asked for a snack”, past participle sentences
(“[The horses bought at the market] proved worthless”) and gerund sentences (“He found [a
ring belonging to his mother]”). In each example, the part of the sentence corresponding to the
subclause attached to the noun phrase has been marked in brackets. When extracting this type
of subclauses, duplication of the subject may be required. An example of this process is shown
in Table 4.</p>
      </sec>
      <sec id="sec-4-3">
        <title>3.3. Identifying Embedded Discourse Boundaries</title>
        <p>For each sentence, the set of subclauses extracted from it are ordered so that relations of
embedding are presented in the same order as in the discourse: main clause followed by
embedded clause. The procedure is applied in succession to the resulting sequence of subclauses.</p>
        <p>For each clause identi ed in the manner described above, the system extracts the following
information: voice, tense and main verb. The system also considers the following additional
aspects that play a role on boundary identi cation: whether the clause includes a subordinated
clause, whether the main verb of the clause has connotations of change in reporting mode,
whether the clause includes a clause-grouping punctuation sign (opening or closing). Finally,
given the recursive nature of embedded discourse, the system maintains stack data structures
that allow it to keep track at any given point of: the depth of levels of quotation and the depth
of levels of embedding.</p>
        <p>At each point, a look ahead stage has been added to check whether: the following clause
starts with quotation marks, or the following clause di ers in tense/voice/mode from the current
one.</p>
        <p>The heuristic that decides at each point of the discourse whether to consider the start of a
span with a di erent reporting mode for events relies on the following conditions: whether
the syntactic structure of the given clause includes embedded subclauses, whether the main
verb of the clause is considered to change the mode of reporting–as de ned by a resource le
that lists verbs of this type–and whether the clause start with opening quotation marks. The
decision on the end of a span of discourse relies on the following conditions: whether the clause
ahead involves changes in tense, voice and mood; whether the current clause already opened a
span of reported discourse–which would make it a main clause and necessarily followed by at
least one subordinate subclause– and whether closing quotation marks appear at the end of the
subclause.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Discussion</title>
      <p>The described solution can be discussed from two points of view: in terms of recall and precision
metrics of the outcomes of the solution with respect to the manually annotated reference corpus
and in terms of how it compares with existing solutions for similar tasks.</p>
      <sec id="sec-5-1">
        <title>4.1. Metrics over Reporting Mode Spans</title>
        <p>To provide a quantitative evaluation of the quality of the results, we consider: precision and
recall over positions of embedded story starts, precision and recall over positions of embedded
story ends, and precision and recall over bags of words identi ed for speci c spans as annotated
in the reference corpus. The current outputs for the Propp tales are shown in Table 5 and
current outputs for the D’Aulnoy tales are shown in Table 6.</p>
        <p>
          Results for the two di erent data sets are shown separately due to the signi cant di erences
that exist between them. The synopses for Propp tales are telegraphic transcriptions of the
formal analysis described in shorthand in Propp’s book. The synopses for D’Aulnoy’s tales
are more elaborate descriptions provided by Williams in her thesis [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. The di erences in the
complexity of the language employed give rise to the di erences in scores. Nevertheless, results
are comparable across the two sets.
        </p>
        <p>Reasonably high (0.84 and 0.90) precision on starts means that most of the starts of embedded
stories found by the parse are indeed starts that appear marked in the reference. The lower
values (0.62 and 0.40) for recall on starts means that a certain proportion of the starts marked
are not being located correctly. This suggests that there are further features that identify the
start of these spans, beyond verbs that indicate changes in reporting mode, that would need to
be considered. Reasonably high (0.84 and 0.85) precision on ends means that most of the ends
of embedded stories found by the parse are indeed ends that appear marked in the reference.
Low (0.62 and 0.37) recall on ends means that a large proportion of the ends marked are not
being located correctly. However, it is important to note that, because the procedure treats
the location of starts and ends separately, the values for location of ends will be considerably
a ected by any errors in the identi cation of the corresponding starts. Relatively high values
on precision (0.95 and 0.80) and recall (0.92 and 0.89) for the identi cation of words belonging
to speci c spans suggest that, once the values for locating starts and ends are improved, the
procedure would work well.</p>
        <p>It is also important to consider that some of the incorrect diagnoses leading to low values
in precision and recall may be tracked back to error rates in the parser. The results of the
parser chosen for the task very often misrepresent the relations between the constituents of
the sentence. For instance, the parses of sentences such as “the girl orders the gendarmes to
destroy the garden” consider that “the gendarmes” as subject of the clause “to destroy” and not
as direct object of the clause “orders”. These may explain why precision and recall for retrieving
words for a speci c span do not reach maximal values. A solution enhanced with the results of
a dependency parse will be considered as further work and should help improve results.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Comparison with Prior Work</title>
        <p>
          The fundamental challenge for the development of adequate machine learning solutions to tasks
of segmenting/classifying reported discourse in text has been identi ed as the lack of training
resources of appropriate size and coverage, both in the context of attribution relations [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and
narrative levels [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The corpus developed by [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] constitutes a valuable resources for the speci c
task of identifying attribution relations, but it does not provide coverage for phenomena beyond
that. The work reported in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] proposes a solution that relies on the generation of synthetic
data to augment automatically a small data set used as seed. The resulting data set is valuable
for the annotation of narrative levels but does not consider other phenomena.
        </p>
        <p>As long as there are no data sets large enough to train machine learning solutions to the task,
the proposed solution may provide a simple baseline that obtains results with acceptable values
of precision and recall over basic features. Whereas the segmentations generated are clearly far
from perfect, they can provide a valuable starting point for any e orts on developing corpora
with the assistance of human annotators. At worst, they can be considered as a way to help
create the seed data sets required for automatic augmentation via data synthesis.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusions</title>
      <p>A procedure for detecting spans of text that correspond to di erent modes of reporting events
has been proposed. The proposed procedure operates directly on a syntactic parse of a story,
without requiring any special depths of semantic analysis. It is also designed to operate on
inputs based on their syntactic structure and a lexical resource that identi es verbs which
indicate a change in reporting mode. These characteristics allow it to operate as a baseline
solution for obtaining an initial segmentation in the absence of su cient volumes of annotated
data to train machine learning solutions.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This paper has been partially supported by the CANTOR project (PID2019-108927RB-I00) funded
by the Spanish Ministry of Science and Innovation.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Surdeanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bauer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Finkel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bethard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. McClosky</given-names>
            ,
            <surname>The Stanford CoreNLP Natural Language Processing Toolkit</surname>
          </string-name>
          .,
          <source>in: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <string-name>
            <surname>ACL</surname>
          </string-name>
          <year>2014</year>
          , June 22-27,
          <year>2014</year>
          , Baltimore,
          <string-name>
            <given-names>MD</given-names>
            , USA,
            <surname>System</surname>
          </string-name>
          <string-name>
            <surname>Demonstrations</surname>
          </string-name>
          , The Association for Computer Linguistics,
          <year>2014</year>
          , pp.
          <fpage>55</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Portner</surname>
          </string-name>
          , Modality, Modality, OUP Oxford,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D. Y.</given-names>
            <surname>Oshima</surname>
          </string-name>
          , Perspectives in reported discourse,
          <source>Ph.D. thesis</source>
          , Stanford University Stanford,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Pareti</surname>
          </string-name>
          ,
          <article-title>Attribution: a computational approach</article-title>
          ,
          <source>Ph.D. thesis</source>
          , The University of Edinburgh,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A. C.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <article-title>Markers of attribution in english and italian opinion articles: A comparative corpus-based study</article-title>
          ,
          <source>ICAME journal 29</source>
          (
          <year>2005</year>
          )
          <fpage>131</fpage>
          -
          <lpage>150</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Papalampidi</surname>
          </string-name>
          , F. Keller, M. Lapata,
          <article-title>Movie plot analysis via turning point identi cation</article-title>
          , arXiv preprint arXiv:
          <year>1908</year>
          .
          <volume>10328</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reiter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sieker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Guhr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Gius</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zarrieß</surname>
          </string-name>
          ,
          <article-title>Exploring text recombination for automatic narrative level detection</article-title>
          ,
          <source>in: Proceedings of the Thirteenth Language Resources and Evaluation Conference</source>
          , European Language Resources Association, Marseille, France,
          <year>2022</year>
          , pp.
          <fpage>3346</fpage>
          -
          <lpage>3353</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>V. I. Propp</surname>
          </string-name>
          , Morphology of the folktale, University of Texas Press,
          <year>1968</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>d'Aulnoy, Contes des fées</article-title>
          , L.
          <string-name>
            <surname>Duprat-Duverger</surname>
          </string-name>
          ,
          <year>1866</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>E. D.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <source>The Fairy Tales by Madamme d'Aulnoy, Ph.D. thesis</source>
          , Rice University, Houston, Texas,
          <year>1982</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>