<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Distinguishing Narration and Speech in Prose Fiction Dialogues</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Adam Eky</string-name>
          <email>adam.ek@gu.se</email>
          <email>adam.ek@ling.su.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mats Wirén</string-name>
          <email>mats.wiren@ling.su.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Stockholm University</institution>
          ,
          <addr-line>SE-106 91 Stockholm</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
      </contrib-group>
      <fpage>124</fpage>
      <lpage>132</lpage>
      <abstract>
        <p>This paper presents a supervised method for a novel task, namely, detecting elements of narration in passages of dialogue in prose fiction. The method achieves an F1-score of 80.8%, exceeding the best baseline by almost 33 percentage points. The purpose of the method is to enable a more fine-grained analysis of fictional dialogue than has previously been possible, and to provide a component for the further analysis of narrative structure in general.</p>
      </abstract>
      <kwd-group>
        <kwd>Prose fiction</kwd>
        <kwd>Narrative structure</kwd>
        <kwd>Literary dialogue</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Prose fiction typically consists of passages alternating between two levels of
narrative transmission: the narrator’s telling of the story to a narratee, and the
characters’ speaking to each other in that story (mediated by the narrator). As
stated in [
        <xref ref-type="bibr" rid="ref1">Dolezel, 1973</xref>
        ], quoted in [
        <xref ref-type="bibr" rid="ref4">Jahn, 2017</xref>
        , Section N8.1]: "Every narrative
text T is a concatenation and alternation of ND [narrator’s discourse] and CD
[characters’ discourse]". An example of this alternation can be found in August
Strindberg’s The Red Room (1879), with our annotation added to it:
(1)
&lt;NARRATOR&gt;
      </p>
      <p>Olle very skilfully made a bag of one of the sheets and stuffed
everything into it, while Lundell went on eagerly protesting.</p>
      <p>When the parcel was made, Olle took it under his arm, buttoned
his ragged coat so as to hide the absence of a waistcoat, and set out
on his way to the town.
&lt;/NARRATOR&gt;
&lt;CHARACTERS&gt;</p>
      <p>– He looks like a thief, said Sellén, watching him from the window
with a sly smile. – I hope the police won’t interfere with him! – Hurry
up, Olle! he shouted after the retreating figure. Buy six French rolls
and two half-pints of beer if there’s anything left after you’ve bought
the paint.</p>
      <p>&lt;/CHARACTERS&gt;</p>
      <p>
        The work described here is part of a larger effort to develop methods for
analysis and annotation of narrative structure in prose fiction. To this end, the
two discourse levels require different types of analysis: In narrator’s discourse,
we are primarily interested in a sequence of events, the ordering of these, and
how they form a plot. Although this is true also for characters’ discourse, the
fact that the narration is expressed through the speech of the characters makes
the problem different.1 In characters’ speech, we are interested in what is said
(the information conveyed. attitudes, beliefs, sentiments, etc.) and who is
speaking to whom. Fortunately, distinguishing narrator’s and characters’ discourses
in prose fiction is relatively straightforward, although the conventions vary
between authors, works and printed editions. Typically, devices such as dashes,
quotation marks and/or paragraph breaks are used for indicating alternations
between the two types of discourse. Furthermore, the problem of identifying
speakers and addressees in prose fiction has been dealt with by [
        <xref ref-type="bibr" rid="ref2">Ek et al., 2018</xref>
        ],
[
        <xref ref-type="bibr" rid="ref5">Muzny et al., 2017</xref>
        ] and [
        <xref ref-type="bibr" rid="ref3">He et al., 2013</xref>
        ].
      </p>
      <p>There is an additional problem, however: The alternation between narrator’s
and characters’ discourses is not as clearcut as the quotation by Dolozel above
may imply, since elements of narration can occur inside characters’ discourses
as well, interspersed with lines. One case of this is when the narrator indicates
who is speaking, and possibly who is being addressed. In the example above, the
speaker is indicated in the first and third lines by the speech-framing expressions
(utterances that introduce or accompany instances of direct speech in narratives)
"said Sellén" and "he shouted", respectively. Furthermore, in the third line the
addressee is indicated by a description, "he shouted after the retreating figure".
This addressee is also indicated by a vocative, "[Hurry up,] Olle", but that is
part of the speech and hence does not belong to the narration.</p>
      <p>Another case is when the narrator describes how something is being said or
what is happening during the speech: "watching him from the window with a
sly smile" in the first line, and "he shouted after the retreating figure" in the
second line. In the latter example, "after the retreating figure" also illustrates
that these elements of narration may serve more than one function, in this case
both indicating who is the addressee and describing his activity.</p>
      <p>Given that we want to analyse narrative structure, it is important to
distinguish these elements of narration inside characters’ discourses for several
reasons: First, we need to recognize speech-framing expressions for the purpose of
identifying speakers, and vocatives and other constructions for the purpose of
identifying addressees. Furthermore, we need to extract other elements of
narration related to the quality of the speech or the situation to determine their
contributions to the overall plot.
1 We refer to this interchangeably as "characters’ discourse" or "dialogue". For the
purpose of this work, we are only concerned with direct speech as opposed to indirect
speech, the latter of which we here regard as part of the narration.</p>
      <p>To the best of our knowledge, this paper provides the first approach to the
problem of distinguishing narration and speech within passages of dialogue in
prose fiction.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data</title>
      <p>The data consists of excerpts from four novels by Swedish authors (in Swedish):
August Strindberg’s The Red Room (1879), Hjalmar Söderberg’s The Serious
Game (1912), Birger Sjöberg’s The Quartet That Split Up, part I (1924), and
Karin Boye’s Kallocain (1940). The number of lines and tokens are shown in
Table 1. In total, 52.7% of the lines contain narration and 15.6% of all the
tokens in the lines belong to narration.</p>
      <p>We began by extracting all the passages of dialogue (characters’ discourse)
in the works, each consisting of one or several lines. By "line" we here mean
both the direct speech of the characters and any narration interspersed with
this, in such a way that a dialogue passage is completely divided into lines, as
exemplified in (3). Each dialogue was annotated by the authors using the opening
and closing tags (&lt;NC&gt;, &lt;/NC&gt; for "narrative constructions") to demarcate the
narration in a line, for example:
(2)
– He looks like a thief, &lt;NC&gt; said Sellén, watching him from the
window with a sly smile. &lt;/NC&gt;
– I hope the police won’t interfere with him!
– Hurry up, Olle! &lt;NC&gt; he shouted after the retreating figure. &lt;/NC&gt;
Buy six French rolls and two half-pints of beer if there’s anything
left after you’ve bought the paint.
3
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Method</title>
      <sec id="sec-3-1">
        <title>Task</title>
        <p>The problem was regarded as binary classification, where the task was to
classify each token in a line as to whether it was an element of narration or not.
Put differently, the task can be regarded as narration detection in passages of
dialogue. To solve this problem, a logistic regression model was used.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Features</title>
        <p>
          The data was preprocessed using the Swedish annotation pipeline of efselab.2
The pipeline uses Stagger 3 [Östling, 2013] for part-of-speech tagging and
tokenization, and MaltParser [
          <xref ref-type="bibr" rid="ref6">Nivre et al., 2006</xref>
          ] for dependency parsing. The final
output is a CoNLL-U file3. A list of speech verbs was compiled manually by the
authors from the corpus.
        </p>
        <p>
          For each token in a line, a set of features were extracted to predict whether
the token was part of narration or not. In addition to capturing the features of the
current token, features were extracted from the four succeeding and preceding
tokens. The features used are described below and summarized in Table 2.
– Speech verb: The token is a speech verb.
– Lemma: The lemma form of the token.
– Part-of-Speech: Part-of-speech tag of the token.
– Punctuation: Token is any punctuation mark.
– Exclamation/question mark: Token is an exclamation or question mark.
– Grammatical features: The grammatical features of the token.4
– Dependency relation: Dependency tag of the token.
– Dependency root: The token is the head (root) word of the sentence.
– Sentence ID: Each sentence in a line is numbered. Based on this, the ID
of the sentence that the token belongs to (first, second, etc.).
– Sentence termination: The punctuation symbol used to terminate the
sentence that the token is part of.
– Unit ID: Each line is segmented into numbered units delimited by
punctuation marks. For example, the line:
[– He looks like a thief, said Sellén, watching him from the window with a
sly smile.]
is segmented as follows:
[– (He looks like a thief,)0 (said Sellén,)1 (watching him from the window
with a sly smile.)2]
2 https://github.com/robertostling/efselab.
3 https://universaldependencies.org/format.html
4 For further information, see: https://universaldependencies.org/u/feat/index.html.
Punctuation tokens were excluded from the classification, since we did not
consider it meaningful to predict whether they belonged to narration or not. They
were still used as features for other tokens, however.
The logistic regression model was implemented using Python 3 and the sklearn
package [
          <xref ref-type="bibr" rid="ref8">Pedregosa et al., 2011</xref>
          ].5 The data was split into 5% development and
95% train/test. The model was evaluated using 10-fold cross-validation with
10% of the data used for testing in each fold. To estimate the performance of
the logistic regression model, it was compared against two baselines based on
speech verbs, described below.6
– SV ! End: The first baseline looks for lines which begin with direct speech
and end with narration, for example (where boldface indicates narration):
[– He looks like a thief, said Sellén, watching him from the window
with a sly smile.]
This baseline is found by identifying the first speech verb in the line and
then labelling the speech verb and all subsequent tokens as narration.
– SV ! P unctuation: The second baseline is a more specific extension of
the first, which additionally captures lines where narration is surrounded by
speech, for example:
5 All resources and code used in this paper are available at https://github.com/
adamlek/sv_narration.
6 As with the model, none of the baselines label punctuation tokens.
[– Hurry up, Olle! he shouted after the retreating figure. Buy six French
rolls and two half-pints of beer if there’s anything left after you’ve bought
the paint.]
This baseline is found by identifying the first speech verb in the line and
then labelling the speech verb and all subsequent tokens as narration until
a sentence-terminating punctuation mark (. ! ?) is encountered.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>The baselines and the model’s performance were measured by calculating the
precision, recall, and F1-score from the token predictions of each line. The model’s
performance is reported as the average precision, recall and F1-score from the
cross-validation. The results from the baselines and the model are shown in
Table 3.</p>
      <p>As expected, the SV ! End baseline has the lowest F1 performance, but a
higher recall. SV ! Punctuation performs better in terms of precision and
F1score. The logistic regression model shows great improvements on all metrics: a
gain of 34.1 percentage points in precision, 21.3 percentage points in recall and
32.9 percentage points in F1-score when compared against the best baselines.</p>
      <p>To investigate the influence of the context window size, the model was tested
with a window of 0–9 tokens. Figure 1 shows the results of this, which indicate
that a context window of 0 to 1 performs poorly in comparison to larger
windows. The model’s performance stabilizes when the context window is 4 tokens
or larger, only showing minor fluctuations thereafter.</p>
      <p>In addition to evaluating precision, recall and F1-score, the logistic regression
model and the baselines were evaluated on their ability to find full and partial
solutions to complete lines (see Table 4). In a Partial solution, at least one
of the tokens belonging to narration in the line are found. Alternatively, if all
these tokens are found, but in addition some direct speech tokens are classified as
belonging to narration, the line is also regarded as partially correct. In a None
solution, the model does not find any narration tokens in a line that contains
narration tokens. The columns Full, Partial and None represent all lines that
contain narration.
0:6
0
1
2
7
8</p>
      <p>9
3
4
5</p>
      <p>6</p>
      <p>Context window size</p>
      <p>FP-error is the proportion of lines predicted to have narration tokens but
which do not have any narration tokens in the annotated data. The results,
shown in Table 4, are based on the average number of solutions obtained using
cross-validation.</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <sec id="sec-5-1">
        <title>Model evaluation</title>
        <p>The problem appears to be easy at first glance, but the baselines show that
the simple rule-based heuristics are unable to capture most cases of narration
dialogue. A simple logistic regression model is able to overcome many of the
weaknesses of the baselines and performs well on the task. One of the main
strengths of the model is that it is able to detect narration in lines solely based
on the tokens in the line, with no other contextual information available. In
Figure 1 we examined the influence of context window size on the performance
and showed that using more than four context tokens only has minor effects. This
indicates that while there may be long-range dependencies, the most important
features are captured in a context window of four tokens.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Speech verbs</title>
        <p>Table 4 shows that the model finds solutions to 96.9% (Full + Partial
solutions) of the lines with narration, which is an increase of 36.4 percentage points
compared to the SV ! Punctuation baseline. This increase amounts to the
number of additional solutions the model found compared to the baselines e.g. the
reduction of None errors. A weakness of both baselines is their reliance on
predefined speech verbs. Having a pre-defined list of speech verbs is not realistic
when generalizing to unseen books. The results from Table 4 indicate that the
model is able to recognize narration beyond the pre-defined list of speech verbs
by instead learning these indicators from the data.
5.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Error analysis</title>
        <p>Looking closer at the errors the model makes on partial solutions, one problem is
that the model predicts short segments of speech within narrations. For example,
in the sentence below (bold indicates tokens tagged as narration):
(3) [– You spoke about an introduction, said Rissen to the woman without
attaching himself to me. How do you get an introduction?]</p>
        <p>Most of the narration tokens are correctly classified except the tokens
[himself to] which have been classified as speech rather than narration. The model
predicts that narration may be interrupted by speech and then continued. This
has not been observed in the data and the error arises because the features used
capture lexical properties but do not explicitly capture structural information
about narration length. This problem could be avoided by including
information about the typical length of narration within dialogues and by restricting
the number of narration sequences to one per line. In other words, for any line,
there would be at most one continuous sequence of narration tokens.</p>
        <p>Related to the above problem, the model makes mistakes when a character
uses a speech verb in direct speech. This problem could also be avoided by
restricting the number of narration sequences to one. However, speech verbs
appear to be important for the model and such a solution may prefer tagging only
the speech verb as narration rather than the actual sequence of narration tokens.
To avoid this, the restriction to one narration sequence should be combined with
(a) narration length (e.g., prefer a longer sequence to a shorter one) and/or (b)
assigning a lower weight to the speech verb feature.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion and future work</title>
      <p>This paper introduces the novel task of narration detection in passages of
dialogue in prose fiction, and reports the first results on this using a logistic
regression model and data from four Swedish novels. Due to lack of previous research,
the model is compared against two baselines and is shown to achieve significant
gains over both of them. Most of our features are lexical in nature, but our
error analysis indicates that more structural features would be needed in order to
further improve the model. In future work, we hope to remedy this and plan
to generalize the task to multiple languages. We expect this kind of method to
be a valuable component for the purpose of a more fine-grained analysis of
fictional dialogue than has previously been possible, and for the further analysis
of narrative structure in general.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>This work has been supported by an infrastructure grant from the Swedish
Research Council (SWE-CLARIN, project 821-2013-2003). We want to thank
Murathan Kurfalı for very valuable comments.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Dolezel</surname>
          </string-name>
          ,
          <year>1973</year>
          . Dolezel,
          <string-name>
            <surname>L.</surname>
          </string-name>
          (
          <year>1973</year>
          ).
          <article-title>Narrative Modes in Czech Literature</article-title>
          . University of Toronto Press.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Ek</surname>
          </string-name>
          et al.,
          <year>2018</year>
          . Ek,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Wirén</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Östling</surname>
          </string-name>
          ,
          <string-name>
            <surname>R.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Nilsson</given-names>
            <surname>Björkenstam</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          , Grigonyte˙,
          <string-name>
            <surname>G.</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Gustafson</given-names>
            <surname>Capková</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          (
          <year>2018</year>
          ).
          <article-title>Identifying speakers and addressees in dialogues extracted from literary fiction</article-title>
          .
          <source>In Language Resources and Evaluation Conference</source>
          , Miyazaki, Japan,
          <fpage>7</fpage>
          -12 May
          <year>2018</year>
          . European Language Resources Association.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>He</surname>
          </string-name>
          et al.,
          <year>2013</year>
          . He,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Barbosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            , and
            <surname>Kondrak</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Identification of speakers in novels</article-title>
          .
          <source>In ACL (1)</source>
          , pages
          <fpage>1312</fpage>
          -
          <lpage>1320</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Jahn</surname>
          </string-name>
          ,
          <year>2017</year>
          . Jahn,
          <string-name>
            <surname>M.</surname>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Narratology: A guide to the theory of narrative</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Muzny</surname>
          </string-name>
          et al.,
          <year>2017</year>
          . Muzny,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            , and
            <surname>Jurafsky</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>A twostage sieve approach for quote attribution</article-title>
          .
          <source>In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume</source>
          <volume>1</volume>
          ,
          <string-name>
            <surname>Long</surname>
            <given-names>Papers</given-names>
          </string-name>
          , volume
          <volume>1</volume>
          , pages
          <fpage>460</fpage>
          -
          <lpage>470</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Nivre</surname>
          </string-name>
          et al.,
          <year>2006</year>
          . Nivre,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            , and
            <surname>Nilsson</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          (
          <year>2006</year>
          ).
          <article-title>Maltparser: A data-driven parser-generator for dependency parsing</article-title>
          .
          <source>In Proceedings of LREC</source>
          , volume
          <volume>6</volume>
          , pages
          <fpage>2216</fpage>
          -
          <lpage>2219</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Östling</surname>
          </string-name>
          ,
          <year>2013</year>
          . Östling,
          <string-name>
            <surname>R.</surname>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Stagger: An open-source part of speech tagger for swedish</article-title>
          .
          <source>Northern European Journal of Language Technology (NEJLT)</source>
          ,
          <volume>3</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Pedregosa</surname>
          </string-name>
          et al.,
          <year>2011</year>
          . Pedregosa,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          , et al. (
          <year>2011</year>
          ).
          <article-title>Scikit-learn: Machine learning in python</article-title>
          .
          <source>Journal of machine learning research</source>
          ,
          <volume>12</volume>
          (Oct):
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>