<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Automatic Segmentation of Narrative Text Into Scenes According to SceneML</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>TarfahAlrashid</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>RobertGaizauskas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>In: R. Campos, A. Jorge, A. Jatowt, S. Bhatia, M. Litvak (eds.): Proceedings of the Text2Story'25 Workshop</institution>
          ,
          <addr-line>Lucca</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Jeddah</institution>
          ,
          <addr-line>Jeddah</addr-line>
          ,
          <country country="SA">Saudi Arabia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Shefield</institution>
          ,
          <addr-line>Shefield</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Automatically segmenting narrative text into scenes is a complex task that remains relatively underexplored. Scenes form fundamental structural units within narratives, marking shifts in time, location, and character interactions. In this paper, we introduce a supervised learning approach to scene segmentation, using SceneML, an annotation framework for narrative text. We evaluate multiple models, including BERT-based classifiers and Conditional Random Fields (CRF), treating scene segmentation as a sentence classification and sequence labeling task. Our experiments show that the BERT cased model achieves the highest balanced accuracy of 0.58 and an F1 score of 0.24 for the minority class. However, statistical tests revealed no significant diferences among BERT-based models but highlighted distinctions between CRF models and BERT models. These results indicate that while supervised learning models can improve scene segmentation, further refinements are needed. We discuss potential enhancements, including sequence-based transformer models, integration of temporal and geographical references, and the investigation of decoder-only models such as GPT-3 and GPT-4. Our findings highlight both the progress and challenges in automating scene segmentation and provide directions for future research.</p>
      </abstract>
      <kwd-group>
        <kwd>Narrative text</kwd>
        <kwd>scene segmentation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>Narrative texts, whether in literature, film scripts, or storytelling applications, often follow
a structured progression of scenes that convey events, character interactions, and shifts in
time and location. Automatically segmenting these texts into scenes can enhance various
natural language processing (NLP) tasks such as text summarization, information retrieval, and
interactive storytelling. It is also of interest to literary scholars studying variation in narrative
structure within and across authors. However, scene segmentation remains a challenging
problem due to the complexity of defining and identifying boundaries within a continuous text.</p>
      <p>Existing studies on text segmentation primarily focus on topic-based segmentation, lexical
cohesion, and discourse structure, but these approaches are insuficient for capturing scene-level
transitions in narratives. Previous work specifically on the segmentation of narrative texts has
investigated lexical cohesion measures, supervised classification, and event boundary detection,
yet none of this work is set within a broad framework for annotation of narrative structure and
has achieved limited results.</p>
      <p>This paper introduces an approach to scene segmentation based on SceneML, an annotation
framework designed for narrative t1e]x.tW[e develop and evaluate supervised learning models
that leverage contextual and linguistic features to automatically segment narrative texts into
coherent scenes. Our work difers from prior studies by incorporating a more comprehensive
scene representation, addressing scene transitions, and using machine learning techniques to
enhance segmentation performance.</p>
      <p>The remainder of this paper is structured as follows: Sec2trioenviews related work in
text segmentation and scene detection. Section 3 describes the dataset used for training and
evaluation. Section 4 presents the models and experimental setup. Section 5 discusses the
results and their implications, followed by the conclusion in Section 6.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work</title>
      <p>
        Automatic segmentation of narrative text into scenes remains underexplored. Existing studies
address related tasks such as lexical cohesion-based segmentation, feature-based segmentation,
and event segmentation but lack comprehensive scene annotation frameworks. Kozima and
Furugori[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] use what they call a Lexical Cohesion Profile (LCP) to detect scene boundaries
through shifts in lexical cohesion, but the approach’s reliance on fixed window sizes limits
its applicability. Kauchak and Che[3n] frame segmentation as a classification task, using an
SVM classifier with lexical and structural features, yet their approach disregards sequential
dependencies, leading to potentially inconsistent segment lengths. Event segmentation has also
been studied, focusing on narrative shifts in film based on location, character, and time, though
this work does not provide computational models applicable to4t].exCtlo[sest to our work, a
more recent study by Zehe et [a5l]. developed a scene annotation scheme for German narratives
and tested unsupervised and supervised segmentation models. However, their definition of
scene difers from ours by requiring not only that a scene be a portion of a narrative where
location, characters and time are coherent, i.e. do not change, but which centres on a single
central action. We do not require this last condition. Overall, their annotation scheme is quite
limited and their segmentation model achieves relatively weak performance (F1 = 0.24). Unlike
previous work, our approach builds on a more comprehensive annotation framework, SceneML,
that captures a broader range of narrative scene dynamics.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Data Set</title>
      <p>The dataset used for our study – the ScANT cor6p]us–[was constructed for the study of
narrative structure and is composed of selected chapters from children’s stories and adult novels
that are no longer protected by copyright. Children’s stories were specifically chosen with the
expectation that they would exhibit a relatively simple narrative structure. Conversely, adult
novels were included to incorporate more complex narratives, posing a greater challenge for
automated analysis of narrative structure. There are three sources for the dataset. The first
is ‘Bunnies from the Future’, a middle-grade children’s story authored by Joe Corcoran. The
second source is ‘The Wonderful Wizard of Oz’, originally part of the Brown Corpus. Finally, the
third source comprises ‘Pride and Prejudice’, ‘A Tale of Two Cities’, ‘The Adventures of Sherlock
Holmes’ and ‘The Great Gatsby’ obtained from Project Gutenberg. The dataset is annotated with
a subset of the SceneML elements proposed i1n],[specifically just scene, scene description
segment (SDS) andscene transition segment (STS), along with the more recently added
non-scene element. In brief these may be described as follows:
Scene A scene is defined as a unit of narrative in which the time, location and principal
characters are constant and in which specific events which constitute the narrative are
recounted. Any change in these three elements indicates a change in the scene.
Scene Description Segment (SDS) A scene is realised in written forms of narrative through
one or more, potentially non-contiguoscuesn,e description segments (SDSs), themselves
contiguous sequences of sentences all narrating the same scene. The SDS mechanism
allows for the relating of one scene in a narrative to be embedded within another, as for
example, in flashback or flashforward.</p>
      <p>Scene Transition Segment(STS) Some passages describe not one scene or another but rather
the transition between scenes. So, one SDS describing a conversation between two
characters A and B in location L could be followed by a single sentence “As soon as B had
left, A jumped in a taxi and drove to′L”. At L′ a new scene might then unfold. The single
sentence joining the two SDSs does not belong to the first scene nor to the second. And it
does not constitute a scene in its own right, as no narrative-significant action takes place
during the time it describes, save the transition of A to a new location. Its sole narrative
function is to indicate a transition from one scene to another. Such elements SceneML
refers to asscene transition segments (STSs).</p>
      <p>Non-scene Elements Aside from STS’s, other elements are also present in narrative text.</p>
      <p>These include general philosophising or opinion segments, background information
segments, and narrative summary or narrative catchup (e.g. “It was the best of times, it
was the worst of times, it was the age of wisdom, it was the age of foolishness …” from
Charles DickensA, Tale of Two Cities). These passages serve a variety of functions but do
not relate specific, situated events involving protagonists in the story. All such passages
SceneML designates anson-scene elements.</p>
      <p>The ScANT dataset has 2,796 sentences, 55,635 words and 191 S1D. Ss</p>
    </sec>
    <sec id="sec-5">
      <title>4. Models</title>
      <p>To build a model that can automatically segment narrative text into scenes (SDSs) using machine
learning, first, we need to train the model using training data. To make the problem easier for
automatic scene segmentation, we treated the task as a sentence classification problem instead
of text segmentation, where each sentence is given a tag (i.e. 1 is designated for sentences on
the boundary of an SDS, either at the beginning or the end, and 0 otherwise). Scene Transition
Segments are not considered as a separate classification task here as their numbers in the
annotated data were very small compared to the number of annotated SDSs.
1The corpus is free for research purposes and is available fhrtotmps://doi.org/10.15131/shef.data.21517908..v1</p>
      <p>The machine-learning models were trained and tested using the ScANT corpus. To ensure
robust evaluation, stratified 10-fold cross-validation was implemented using the scikit-learn
library. This technique splits the data into 10 equally sized folds while preserving the class
distribution, allowing the models to be trained and tested on each fold independently. This
approach helps in obtaining reliable performance estimates for the models. Notably, the data
were not shufled to preserve sentence order. The following section presents and explains the
machine-learning models used for the task.</p>
      <p>Three machine-learning models were trained and tested on the annotated data. Then, we
compared the models’ performances to determine which model is optimal for our task. The
following subsections provide a brief description of each of the models.</p>
      <sec id="sec-5-1">
        <title>4.1. Model 1 - The Conditional Random Field (CRF) Model</title>
        <p>In the first model, we treated the problem as a sequence-labelling problem, where the order of
sentences is significant and the wider textual context of the sentence being labelled is important.
Herein, the sequence refers to the ordered sentences of each chapter and their corresponding
tags. A CRF model was trained on the training data. For this endeavour, we first extracted the
following features:
• Transitioning phrases: This is a binary feature, where if the sentence contains a
transitioning phrase, the feature is given a tag1 aonfd0 otherwise. This feature aims to identify
transitions between diferent segments within the text. Transitioning words/phrases (e.g.
later on, after , etc.) are hypothesised to appear more in sentences on the boundaries of a
scene.
• Beginning or end of a paragraph: This is also a binary feature, where if the sentence
occurs at the beginning or end of a paragraph, the feature is given a1 taangd o0f
otherwise. This feature aims to capture paragraph-level patterns that might influence the
classification of the current sentence.
• End of a chapter (true/false): This binary feature denotes whether the current sentence
occurs at the end of a chapter, as the end of a chapter usually indicates the end of a scene.
• Part-of-speech (POS) tags: Incorporating the part-of-speech tags of each word in the
current sentence being classified was carried out using spaCy. In addition, POS tags were
extracted for the two preceding sentences and the two following sentences.
• Named entity: Each word in the sentence being classified was given a BIO tag. The
Named Entity Recognition (NER) function used was implemented by spaCy, using the
NER model en_core_web_md. Named entities can include names of people, organisations,
locations or other specific entities. In addition, the words of the two preceding sentences
and the two following sentences were also given named entity tags.
• Contextual information (2 sentences before and after): This feature considers the two
sentences preceding and the two sentences following the current sentence. By
incorporating neighbouring sentences, the model can capture contextual dependencies and the
influence of surrounding information on the classification of the current sentence. This
information presented to the model as a set of features. The same set of features extracted
from the test sentence is also extracted from the two preceding sentences and the two
following.
• Visually descriptive language (VDL): Visually descriptive information as describ7e]d in [
is used here as a feature, on the basis of the hypothesis that a scene change is likely to
include a description of a new setting. A classifier was developed to classify sentences as
(0, 1, or2), where:
– 0 tag: not visually descriptive
– 1 tag: visually descriptive
– 2 tag: partially visually descriptive
To assess the efectiveness of the VDL feature on the performance of the CRF classifier,
the model was tested twice—once with the VDL feature added to the list of features and
once without.</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Model 2 - Bidirectional Encoder Representations from Transformers (BERT)</title>
        <p>The second model developed is a deep-learning model that uses a pre-trained language model.
The ktrain library8][ was utilised to implement the model, with the use of BER9]Tf[rom
Hugging Face transformers. Two experiments were conducted on the model to explore the
most efective implementation: one with BERT cased and one with BERT uncased. The model
was fine tuned using a learning rate of 1.44E-05, with 3 epochs, and maximum length of 128 for
Bert-Cased, and 5 epochs, and maximum length of 256 foBrert-Uncased.</p>
      </sec>
      <sec id="sec-5-3">
        <title>4.3. Model 3 - Sentence Pair Classification with BERT</title>
        <p>In an attempt to capture as much context as possible, the task of scene segmentation was also
treated as a sentence pair classification task, where the relationship between a current sentence
and its surrounding context is assessed. The input consists of a pair: the first-pair part is
the current sentence and the second-pair part is the concatenated form of the sentence itself
along with the two preceding and two following sentences. This allows the broader context
surrounding the current sentence to be considered during classification (see1)F.iAgsurweith
model 2, this model was implemented using the ktrain library. To explore diferent variations,
two experiments were conducted using pre-trained BERT models from the Hugging Face model
repository: BERT cased and BERT uncased.</p>
        <p>The model was fine tuned using a learning rate of 1.44E-05, with 10 epochs, and maximum
length of 512 forBert-Cased andBert-Uncased.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Results</title>
      <p>Table 1 presents the performance results of the six machine-learning models, namely, CRF,
CRF(VDL), which refers to CRF models with the VDL feature added, BERT cased, BERT uncased,
Sent-Pair Cased, which is a sentence pair classification with a BERT cased model, and Sent-Pair
[ 1,  2, … ,   ]</p>
      <p>↓
[( 1,  1), ( 2,  2), … , (  ,   )]
where   =  −2 +  −1 +   +  +1 +  +2
Uncased (with BERT base uncased). The models are evaluated using diferent metrics, including
accuracy, balanced accuracy, precision (with both macro and weighted average), recall (with
both macro and weighted average) and F1 (with both macro and weighted average and F1 score
for each class 0 and 1 independently). Tenfold-cross-validation was used to test each of the six
models.</p>
      <p>The findings indicate that accuracy alone showed relatively high values across the models
(ranging from 0.87 to 0.92). However, since the dataset is highly imbalanced (there are many
more 0 tags than 1 tags), accuracy alone is not suficient to compare between models. We added
other metrics that can give a better insight into the performance of models, such as balanced
accuracy and F1 for each individual class.</p>
      <p>As can be seen, most of the metrics used in testing the models yielded highly similar results,
which made it dificult to determine which model performed best. However, if we focus on the
two metrics that can be used to reflect the performance of models on imbalanced datasets, the
balanced accuracy metric is often considered when one of the classes is a lot larger than the
other. The BERT cased model achieved the highest balanced accuracy of 0.58, indicating its
ability to handle imbalanced data more than the other models.</p>
      <p>In addition, we obtained the F1 score for each class and focused on the results for a minority
class (class 1) that could help elicit an insight on which of the models would perform better in
predicting class 1. Similarly, the BERT cased model scored the highest, with a 0.24 F1 score.</p>
      <p>Notably, although the BERT cased model achieved the highest scores among all models in
terms of balanced accuracy and F1 (for class 1), it is dificult to derive a conclusion as the results
for the majority of the models are highly similar. To see whether the diferences in our models’
performances are significant, we conducted a statistical test on the results, as reported in the
following section.</p>
    </sec>
    <sec id="sec-7">
      <title>6. Analysis and Discussion</title>
      <p>A statistical analysis was conducted to analyse whether there is a significant diference in the
performance of the six models. Our null hypothesis is that there is no significant diference
in the performance of the six models that were used for scene boundary detection. Of the 10
metrics presented in Tab1le,balanced accuracy and F1 for class 1 are the two metrics chosen to
do the statistical analysis on. The metric values for each of the 10 folds were used as the data
samples for the significance test. A Mann–Whitney U te1s0t] w[as then carried out on these
performance metrics of the six models. Mann–Whitney test is non-parametric test that does
not require normally distributed data and works well with small data sizes, which is the case in
our task (10 BA and 10 F1 scores for each classifier).</p>
      <sec id="sec-7-1">
        <title>6.1. Statistical Analysis</title>
        <p>In general, as shown in Tabl2e, the results showed no significant diference (p-value&gt; 0.05)
in the performance of the models. In terms of balanced accuracy metric, the p-values ranged
from 0.1859 to 0.7336, suggesting no significant diference in the performances of BERT cased,
BERT uncased, sentence pair with BERT cased and sentence pair with BERT uncased. This is
also the case for the results in terms of F1 for class 1. The p-values for the model comparisons
ranged from 0.1402 to 0.8203, suggesting no significant diference in the performances of BERT
cased, BERT uncased, sentence pair with BERT cased and sentence pair with BERT uncased.
On the other hand, the p-values for both metrics (balanced accuracy and F1 for class 1) showed
a significant diference (p-value &lt; 0.05) in the performance of BERT cased compared with CRF
and CRF with the VDL feature.</p>
        <p>In addition, another statistical analysis was conducted to analyse whether there is a significant
diference in the performance of the six models compared to the MCC baseline. T3asbhleows
Mann-Whitney p-value results between the most common class (MCC) classifier and each of the
six models. These p-values were obtained for 10-fold BA, as the F1 for the minority class will
be all 0s for the MCC. The results show that there is a significant diference in the performance
between the MCC and sentence pair with Bert Uncased, Bert Cased, and Bert Uncased. There is
marginally significant evidence of a diference with CRF-VDL. And finally, there is no strong
evidence of a diference with sentence pair with Bert Cased and CRF.</p>
      </sec>
      <sec id="sec-7-2">
        <title>6.2. Discussion</title>
        <p>The stronger performance of BERT could be attributed to the fact that BERT is pre-trained on
the BookCorpus collected by11[]. The BookCorpus is made of 11,038 novels from 16 diferent
genres (e.g. romance, science fiction, fantasy, etc.). Therefore, BERT has seen narrative text
previously.</p>
        <p>Overall, the findings suggest that the choice of model (BERT cased, BERT uncased, sentence
pair with BERT cased and sentence pair with BERT uncased) may not significantly impact
performance in our tasks. Users can select the model that best aligns with their specific
requirements or preferences without compromising performance.</p>
        <p>However, there is no significant diference both between the CRF models and the some of
BERT models and between the two CRF models. This could suggest: (1) the power of language
models pretrained on large amounts of text and then fine-tuned for the task outweighs the use
of features engineered for this specific task but then trained on a small amount of labelled data
(2) VDL either ofers no help for this task or the accuracy level of the VDL classification is too
low to be useful here.</p>
        <p>Finally, comparing the performance of our models to those of Zehe [e5t] aoln. the binary
scene segmentation task they define, we see that our results are broadly similar (0.24 F1 measure).
Given diferences in task definition and dataset not too much should be made of this without
further investigation. However, both their eforts and ours suggest this is indeed a hard task.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>7. Conclusion</title>
      <p>Among the models evaluated, the BERT cased model achieved the highest performance for scene
segmentation, with a balanced accuracy of 0.58 and an F1 score of 0.24 for the minority class.
However, statistical analysis using the Mann-Whitney test revealed no significant diferences
among the BERT-based models, including their cased and uncased versions. Additionally, while
there was a significant diference between the CRF models and the BERT cased model, no
significant diference was found between the sentence pair BERT cased model and the MCC
baseline. Interestingly, a marginally significant diference was observed between MCC and
CRF-VDL, whereas no significant diference was found between MCC and the standard CRF
model. These findings highlight the complexity of the scene segmentation task and suggest
that while BERT-based models demonstrate improved performance, the diferences among
approaches may not be substantial.</p>
    </sec>
    <sec id="sec-9">
      <title>8. Future Work</title>
      <p>Although we have made progress in investigating supervised models for scene segmentation,
there is clearly still room for substantial improvement. As an initial step, designating some
our corpus as a development data, as distinct from training and test data, would allow us to
conduct some failure analysis to determine which cases in particular various models are finding
challenging. Of course acquiring more labelled data should also help – how sensitive task
performance is to training set size is not known.</p>
      <p>
        In terms of model refinement and variation, one possible enhancement is incorporating a
geographical and temporal reference extraction model suc1h2]a,ws[hich could help to detect
scene-related entities and changes more efectively. Additionally, decoder-only models, such
as GPT-3 [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] or GPT-4 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] should be explored for this task. These models could be used
in a zero-shot or few-shot learning setting, where they classify scene boundaries with little
to no task-specific training data. Another possible direction would be to develop a model
that, as with our CRF approach, treats scene segmentation as a sequence-labeling task at the
whole sentence level, but uses sentence embeddings as sentence representations. I.e., instead
of handling segmentation as a classification task at the sentence or sentence pair level, as we
do in our BERT-based models, this approach would learn to assign sentence-level labels across
sequences of sentence embeddings, potentially making better use of longer range contextual
dependencies.
      </p>
      <p>Another interesting and potentially important refinement could be incorporating the detection
of non-scene segments and scene transition segments (STSs) into the task. Learning to identify
these segments explicitly could potentially improve scene segmentation accuracy and would
also result in a better representation the overall narrative structure.</p>
    </sec>
    <sec id="sec-10">
      <title>Acknowledgements</title>
      <p>The authors thank the Text2Story reviewers for their helpful comments. The first author
acknowledges support from the University of Jeddah in the form of a PhD studentship.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Gaizauskas</surname>
          </string-name>
          , T. Alrashid,
          <article-title>SceneML: A proposal for annotating scenes in narrative text</article-title>
          ,
          <source>in: Proceedings of the 15th Workshop on Interoperable Semantic Annotation (ISA-15)</source>
          , Gothenburg, Sweden,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kozima</surname>
          </string-name>
          , T. Furugori,
          <article-title>Segmenting narrative text into coherent scenes</article-title>
          ,
          <source>Literary and Linguistic Computing</source>
          <volume>9</volume>
          (
          <year>1994</year>
          )
          <fpage>13</fpage>
          -
          <lpage>19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kauchak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Feature-based segmentation of narrative documents</article-title>
          ,
          <source>in: Proceedings of the ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing - FeatureEng '05</source>
          ,
          <string-name>
            <surname>June</surname>
          </string-name>
          , Association for Computational Linguistics, Morristown, NJ, USA,
          <year>2005</year>
          , p.
          <fpage>32</fpage>
          . URL: http://www.aclweb.org/anthology/W/W05/W05-04.
          <year>0d5oi</year>
          :
          <fpage>10</fpage>
          . 3115/1610230.1610237.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Cutting</surname>
          </string-name>
          ,
          <article-title>Event segmentation and seven types of narrative discontinuity in popular movies</article-title>
          ,
          <source>Acta Psychologica</source>
          <volume>149</volume>
          (
          <year>2014</year>
          )
          <fpage>69</fpage>
          -
          <lpage>77</lpage>
          . URLh:ttp://linkinghub.elsevier.com/retrieve/ pii/S000169181400078X. doi:
          <volume>10</volume>
          .1016/j.actpsy.
          <year>2014</year>
          .
          <volume>03</volume>
          .003.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Zehe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Konle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. K.</given-names>
            <surname>Dümpelmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Gius</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hotho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Jannidis</surname>
          </string-name>
          , L. Kaufmann,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krug</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Puppe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Reiter</surname>
          </string-name>
          , et al.,
          <article-title>Detecting scenes in fiction: A new segmentation task</article-title>
          ,
          <source>in: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics:</source>
          Main Volume,
          <year>2021</year>
          , pp.
          <fpage>3167</fpage>
          -
          <lpage>3177</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>T.</given-names>
            <surname>Alrashid</surname>
          </string-name>
          , R. Gaizauskas,
          <article-title>ScANT: A small corpus of scene-annotated narrative texts</article-title>
          ,
          <source>in: Proceedings of the Text2Story'23 Workshop</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>143</fpage>
          -
          <lpage>149</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Gaizauskas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramisa</surname>
          </string-name>
          ,
          <article-title>Defining visually descriptive language</article-title>
          ,
          <source>in: Proceedings of the Fourth Workshop on Vision and Language</source>
          , Association for Computational Linguistics, Lisbon, Portugal,
          <year>2015</year>
          , pp.
          <fpage>10</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A. S.</given-names>
            <surname>Maiya</surname>
          </string-name>
          ,
          <article-title>ktrain: A low-code library for augmented machine learning</article-title>
          ,
          <source>arXiv preprint arXiv:2004</source>
          .
          <volume>10703</volume>
          (
          <year>2020</year>
          ).arXiv:
          <year>2004</year>
          .10703.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>T. W. MacFarland</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. M. Yates</surname>
          </string-name>
          , T. W. MacFarland,
          <string-name>
            <surname>J. M. Yates</surname>
          </string-name>
          ,
          <article-title>Mann-whitney u test, Introduction to nonparametric statistics for the biological sciences using R (</article-title>
          <year>2016</year>
          )
          <fpage>103</fpage>
          -
          <lpage>132</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kiros</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zemel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Urtasun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Torralba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fidler</surname>
          </string-name>
          ,
          <article-title>Aligning books and movies: Towards story-like visual explanations by watching movies and reading books</article-title>
          ,
          <source>in: Proceedings of the IEEE international conference on computer vision</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>I.</given-names>
            <surname>Ezeani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rayson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. N.</given-names>
            <surname>Gregory</surname>
          </string-name>
          ,
          <article-title>Extracting imprecise geographical and temporal references from journey narratives</article-title>
          .,
          <source>in: Text2Story@ ECIR</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>113</fpage>
          -
          <lpage>118</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Brown</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ryder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Subbiah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Kaplan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dhariwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Neelakantan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Shyam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sastry</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Askell</surname>
          </string-name>
          , et al.,
          <article-title>Language models are few-shot learners</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Achiam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Adler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Agarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ahmad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Akkaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. L.</given-names>
            <surname>Aleman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Almeida</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Altenschmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Altman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Anadkat</surname>
          </string-name>
          , et al.,
          <source>Gpt-4 technical report, arXiv preprint arXiv:2303.08774</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>