=Paper= {{Paper |id=Vol-3001/paper5 |storemode=property |title=Twin BERT Contextualized Sentence Embedding Space Learning and Gradient-Boosted Decision Tree Ensembles for Scene Segmentation in German Literature |pdfUrl=https://ceur-ws.org/Vol-3001/paper5.pdf |volume=Vol-3001 |authors=Sebastian Gombert |dblpUrl=https://dblp.org/rec/conf/konvens/Gombert21 }} ==Twin BERT Contextualized Sentence Embedding Space Learning and Gradient-Boosted Decision Tree Ensembles for Scene Segmentation in German Literature== https://ceur-ws.org/Vol-3001/paper5.pdf
    Twin BERT Contextualized Sentence Embedding Space Learning and
    Gradient-Boosted Decision Tree Ensembles for Scene Segmentation in
                            German Literature
                                           Sebastian Gombert
                                    Information Center for Education
                    DIPF: Leibniz Institute for Research and Information in Education
                                      Frankfurt am Main, Germany
                                          gombert@dipf.de


                         Abstract                                      rately improving the performance for such follow
                                                                       up processing.
    This paper documents a submission to the                              This paper presents an a participating system
    shared task on scene segmentation hosted at                        at the KONVENS 2021 shared task on scene seg-
    KONVENS 2021 (Zehe et al., 2021b). The
                                                                       mentation (Zehe et al., 2021b) and relies on two
    aim of this shared task was to find methods
    for segmenting narrative texts into different                      steps. For the first one, a BERT-based (Devlin
    scenes – segments of text where location, time                     et al., 2019) neural network trained in a twin net-
    and the constellation of characters stay more                      work setup is used to predict embeddings for re-
    or less coherent. This task is formulated as                       spective input sentences (Reimers and Gurevych,
    a sentence classification task where sentences                     2019). This network was trained to provide an em-
    bordering the scenes have to be distinguished                      bedding space in which sentences bordering scenes
    from in-scene sentences. The approach pre-                         are well-separated from in-scene ones. For the sec-
    sented in this paper is based on two steps. In
    the first one, a twin BERT training setup is
                                                                       ond step, gradient-boosted decision tree ensembles
    used to learn a sentence embedding space in                        (Mason et al., 1999) are then fed these sentence
    which sentences functioning as scene borders                       embeddings as feature vectors to carry out final
    are well-separated from ones that are in-scene.                    predictions.
    In the second one, the sentence embeddings                            For shared task evaluations, this system was
    generated by this model are used as feature                        trained on a data set consisting of various Ger-
    vectors to feed a gradient-boosted decision tree                   man dime novels where scene borders had been
    ensemble which conducts final predictions. In
                                                                       previously annotated. Participating systems were
    the shared task leaderboard, the system ranked
    second in track 1 and first in track 2.                            evaluated in two tracks using F1 scores. In the
                                                                       first track, the models were evaluated using a test
1   Introduction                                                       set consisting of additional dime novels. In this
                                                                       track, the system presented in this paper achieved
Scene segmentation in narrative texts is a novel task                  the second place with an F1 of 0.16. In the second
in natural language processing introduced by Zehe                      track, domain-adaptability was probed by evaluat-
et al. (2021a). The aim of this task is to segment                     ing the systems on a set of German contemporary
pieces of literature into scenes – sections of text                    highbrow literature. Here, the system presented
where the relation of story time and discourse time,                   performed better and was ranked first with an F1
the location and character constellations stay more                    of 0.26.
or less the same. From a formal point of view,
this problem can be interpreted as a sentence in                       2     Background
context classification task where sentences separat-
ing scenes have to be distinguished from in-scene                      2.1    Task Description
ones. This is needed as the typical length of longer                   In Zehe et al. (2021a), the authors interpreted the
narrative texts such as novels prevents techniques                     task of scene segmentation as a sentence classifi-
such as co-reference resolution useful for proceed-                    cation task. They defined four different classes
ing steps of analysis from functioning well (Zehe                      of sentences: no border, scene-to-scene, scene-to-
et al., 2021a). With a text being segmented into                       nonscene and nonscene-to-scene. The three latter
coherent scenes, each scene can be processed sepa-                     of these are used to mark the different kinds of tex-

                                                                  42
Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
tual borders among the sentences. They trained a              3     System Description
BERT-based (Devlin et al., 2019) classifier utilis-
ing a sliding windows over multiple sentences for             My code can be found under 1 .
context encoding to carry out sentence classifica-
tion.                                                         3.1    Adjustments to the Tag Set

   This approach was evaluated against the unsu-              While Zehe et al. (2021a) used a quaternary tag
pervised TextTiling (Hearst, 1997) and TopicTiling            set which distinguished scene to scene- and non-
(Riedl and Biemann, 2012) methods on a corpus                 scene to scene borders which is also used for offi-
consisting of 15 German dime novels using cross               cial shared task evaluations, my system internally
validation. While the supervised BERT model                   relies on a tertiary tag set consisting of the tags
achieved superior results (γ 0.15) compared to the            O, SCENE and NONSCENE. The latter two refer
unsupervised methods (γ 0.01; γ 0.02), the over-              to the first sentence of an according section. The
all results turned out subpar which led the authors           reason for this adjustment is that the number of
conclude that scene segmentation can be regarded              border sentences is low compared to the number
as an inherently hard task.                                   of non-border sentences. My tertiary tag set is
                                                              the smallest classification setup which can be used
   For the KONVENS 2021 shared task, the orga-                to distinguish scenes and non-scenes. Using this
nizers provided an expanded version of the data               tertiary tagset results in all scene to scene- and non-
set presented by Zehe et al. (2021a). This data set           scene to scene sentences being grouped under the
is composed of various German dime novels. The                SCENE task, and all scene-to-nonscene ones under
authors chose this genre as they deemed it easier             the NONSCENE.
for potential models to deal with.
                                                              3.2    Twin BERT Embedding Space Learning
2.2 Related Work                                              My system is built around the idea of neural em-
                                                              bedding space learning. Reimers and Gurevych
While segmenting text into smaller units such as              (2019) introduced the idea of using twin and triplet
tokens, sentences or spans is one of the oldest and           network-based training setups for fine-tuning trans-
most researched topics in natural language process-           former language models to map sentences into
ing, the task of semantically segmenting narrative            meaningful semantic vector spaces under the name
texts into scenes is a new one. In this form, scene           Sentence Transformers. In their training setup,
segmentation was first introduced by Zehe et al.              two or three different sentences are fed into the
(2021a). From a problem-centric point of view,                same transformer language model. These pairs
Zehe et al. (2021a) relate scene segmentation to the          and triplets of sentences are assigned scores such
task of topic segmentation, the task of segmenting            as cosine similarity or concrete training labels. A
a text by topic changes, as changes of time, place            prediction head which is fed the output of the trans-
and character constellation can be interpreted as a           former language model for all two or three sen-
special cases of topic changes.                               tences is trained to predict the assigned scores or
                                                              labels. After this training process, the transformer
   Most of the more recent work in this area (Riedl
                                                              language model can embed sentences into a vector
and Biemann, 2012; Misra et al., 2011) is built
                                                              space where they are well-separated according to
upon latent Dirichlet allocation (Blei et al., 2003).
                                                              the respective training objective.
This method discovers fields of words consistently
                                                                 The idea behind the system presented in this pa-
co-occuring in the same contexts. By monitoring
                                                              per is to combine this approach of twin network
changes in their distribution throughout a text, one
                                                              embedding space learning with the sliding window-
can define topic-wise section borders. Another
                                                              based approach from Zehe et al. (2021a). More
related topic according to Zehe et al. (2021a) is dis-
                                                              precisely, my approach is to utilise a twin network-
course coherence. Recent approaches in this area
                                                              based training setup to learn an embedding space
rely on neural networks to detect textual coherence
                                                              encoding information about a sentence as well as
in various setups and use cases (Li and Jurafsky,
                                                              the sentences surrounding it. The goal here is that,
2017; Pichotta and Mooney, 2016). Changes in
these coherence scores can be used for detecting                1
                                                                  https://github.com/SGombert/
borders within texts, as well.                                ssts-2021-sego

                                                         43
                                                                  were both either scene- or non-scene borders and
                                                                  15000 pairs where both sentences were from differ-
                                                                  ent categories, the majority of them being pairs of
                                                                  scene border and in-scene sentences, from the train-
                                                                  ing set. While the prior set of pairs is assigned a
                                                                  score of 1, the pairs from the latter set are assigned
                                                                  a score of -1.
Figure 1: The architecture of the neural network model
in prediction mode when generating contextualized sen-
tence embeddings.                                                        mconcat (p) = m(s1 (p)) ⊕ m(s2 (p))         (6)


within this vector space, the embeddings of sen-                                f (p) = L(mconcat (p))               (7)
tences bordering scenes are well-separated from
them of in-scene ones.                                               In these equations, p refers to a triple of two sen-
   Instead of a single BERT model as (Reimers                     tences from the training set and an according score
and Gurevych, 2019), it uses two of them with one                 (-1 or 1, depending on class equality), s1 (p) and
functioning as sentence encoder and one as context                s2 (p) are functions retrieving the first respectively
encoder. In both cases, the regular pooling layer                 second sentence from a given training input triple.
output of these networks is used to encode given                  f (p) refers to the final output score calculated by
input sentences. While the sentence encoder is only               the network during training and L to a linear feed-
used to predict a sentence embedding for a given                  forward layer. During training both sentences of a
target sentence, the context encoder also predicts                triple and their according local context sentences
sentence embeddings for a context window of n                     are propagated through both the sentence respec-
sentences to the left and to the right around this                tively the context encoders. Their pooling layer
target sentence. The output of both encoders is                   outputs for both sentences are concatenated and
concatenated to acquire the final embeddings for                  propagated into a linear layer whose single output
embedding a sentence and its context into vector                  neuron is trained to predict the according score
space.                                                            using hinge embedding loss:

                                                                                 (
                                                                                  x             if y = 1
           m(st ) = esent (st ) ⊕ econt (st )         (1)             f (x, y) =                                     (8)
                                                                                  max(0, δ − x) if y = -1

                 esent (st ) = B1 (st )               (2)            Within this function, x is a predicted score, y
                                                                  a gold standard one and δ the so-called margin, a
                                                                  hyper parameter which can be used to control the
  econt (st ) = clef t (st ) ⊕ B2 (st ) ⊕ cright (st ) (3)
                                                                  distances between the vectors a given model learns.
                                                                  This function is used to learn a maximum margin-
    clef t (st ) = B2 (st−n ) ⊕ · · · ⊕ B2 (st−1 )    (4)         like embedding space which separates scene bor-
                                                                  ders from in-scene sentences.
   cright (st ) = B2 (st+1 ) ⊕ · · · ⊕ B2 (st+n )     (5)            The GermanBERT variant provided by Hugging-
                                                                  face Transformers (Wolf et al., 2020) under the
   In these equations, st is a given sentence at time             id bert-base-german-dbmdz-uncased2 is used as
step (position in text) t. m(s) refers to the func-               a base for both sentence encoder and context en-
tion used for predicting embeddings. esent(s) and                 coder. The reason for choosing this model was
econt (s) are the two different encoder networks. B1              that the data it was pre-trained on includes nar-
and B2 refer to the two underlying BERT networks,                 rative texts which makes it an appropriate basis
and clef t (st ) and cright (st ) are the functions used          for a model dealing with literary data. The model
for acquiring the context of a given sentence st . n              was trained using AdamW (Kingma and Ba, 2015;
determines the size of this context.                              Loshchilov and Hutter, 2019) with the learning rate
   For training such a sentence embedding model, I                  2
                                                                      https://huggingface.co/
randomly sampled 15000 pairs of sentences which                   bert-base-german-dbmdz-uncased

                                                             44
                      Figure 2: A visualisation of the twin network-based training setup.


set to 0.000001 and weight decay to 0.0001. The               tree is trained to correct erroneous predictions of
embedding model was trained for one epoch using               the previous ones. As each of them is limited to use
a constant warm up schedule with a constantly in-             only a small subset of the input features provided
creasing learning rate for the first 1000 iterations.         in given input feature vectors, the trained ensemble
No batch processing was used during training.                 can automatically isolate features which globally
   As visible in figure 3, the model indeed learned           distinguish scene borders from in-scene sentences
to embed sentences into a vector space in which               the best within the training set.
they were well-separated into two distinct clusters.             For implementing this part of the system, I used
However, it does not seem that the model gener-               Catboost (Prokhorenkova et al., 2018) as frame-
alized the idea of what exactly is a scene border             work. The model is based upon its multi class
well from the training data. While for ’Der kleine            classification mode. The tree growth policy is set
Chinesengott’, the German dime novel provided                 to lossguide and class weights are used. The fol-
as trial corpus, the majority of scene borders is lo-         lowing formula is used for calculating them:
cated in the smaller of the two clusters, there are
also borders located in the larger cluster, and, more-
                                                                                     num(c)
over, many in-scene sentences are also sorted into                         wc = 1 − PC                         (9)
                                                                                             0
the smaller cluster. This phenomenon was visible                                     c0 num(c )
after multiple training runs with different sampled
                                                                 wc is a respective class weight, c a class, C the
pairs of sentences which implies that drawing clear
                                                              set of all classes, c and c0 classes and num(c) a
distinctions between scene borders and in-scene
                                                              function which returns the number of training ex-
sentences is hard for solely BERT-based models.
                                                              amples for a given class. Additionally, I used early
3.3 Gradient Boosted Decision Tree                            stopping to prevent overfitting. For this, I set the
    Ensembles                                                 number of training iterations to 5000, let the frame-
                                                              work choose a learning rate automatically, and then
As the embedding model did seemingly not learn                used the checkpoint of the model which performed
a precise enough distinction between scene bor-               best on the trial dime novel.
ders and in-scene sentences, using maximum mar-
gin classification with the resulting embeddings              4     Evaluation
as feature vectors was no option. Instead, I chose
gradient boosted decision tree ensembles (Mason               4.1    Results
et al., 1999) as classification algorithm because of          Shared task evaluations were carried out on two
its ability to select distinctive features and ignore         different corpora resulting in two different evalu-
less distinctive ones.                                        ation tracks. The first of these corpora consisted
   During training, this algorithm creates an ensem-          of 5 more dime novels similar to the ones systems
ble of weak regression trees trained to predict the           were trained on to address in-domain transfer ca-
logits within a specialized logistic regression setup.        pabilities of the participating systems. The corpus
Combining enough of such trees results in a strong            used for the second track consisted of two pieces of
learner. This is conducted by means of gradient de-           highbrow German literature. The aim of this track
scent and decision tree learning. Each subsequent             was to evaluate out-of-domain transfer capabilities

                                                         45
Figure 3: The embeddings predicted for the sentences from the dime novel ’Der kleine Chinesengott’ used as trial
data in the shared task visualized in 2D using principal component analysis (Pearson, 1901). 0/brown corresponds
to in-scene sentences, 1/green to scene borders and 2/blue to non-scene borders.


of the participating systems. My system ranked sec-           marked as scene borders within the trial corpus
ond out of four in the first track reaching a micro           were false positives. What became quickly visible
F1 of 0.16 and first out of five in the second track          was that some false positives contained changes
reaching a micro F1 score of 0.26. These results              of time, character constellations and/or location.
confirm the difficulty of this task observed by Zehe          As these function as important signals for a scene
et al. (2021a).                                               change, the model seems to have overgeneralized
                                                              such cases. The following utterances are examples
  Track                      F1      γ      Rank              for a signified change in time from false positives:
  Dime Novels               0.16   0.085     2/4
  Highbrow Literature       0.26   0.175     1/5                             Langsam verstrich die Zeit.
                                                                            Natürlich kamen wir zu spät.
                                                                      unendlich langsam verstrich die Zeit [...].
Table 1: The shared task evaluation results of my sys-             Ich wartete also noch eine Weile, dann aber [...]
tem.                                                                  Gerade in dem Moment vernahm ich [...]

                                                                 Examples for a change in character constellation
4.2 Qualitative Error Analysis                                are the following:
To further analyze the results of my system, I turned              Bills Alarmruf hatte den Spitzbuben verscheucht.
to qualitative error analysis. For this purpose, I                       Der Verfolger war [...] untergetaucht.
                                                                       Da hörte ich Tom plötzlich aufstehen [...].
collected the false negative and false positive scene              Tom erhob sich jetzt und entschuldigte sich [...].
border sentences detected by my system for the trial                 Dem herbeieilenden Portier berichtete ich [...].
corpus and analyzed a selection of them with regard                            Ich war wieder allein [...].
                                                                Bill meldete in diesem Moment den Besuch Dr. Türks.
to common structural patterns. 128 of the sentences              Ich fand ihn ohnmächtig auf dem Fußboden liegen.

                                                         46
  The following utterances are examples for a lo-                     The goal behind this was to train a model which
cation change:                                                     would be able to embed sentences into a vector
              Wir verließen unser Häuschen [...].                 space in which sentences functioning as scene bor-
         ”Schnell, zu Wertheim,” raunte Tom mir zu.                ders would be well-separated from in-scene ones
    Wir trafen uns erst wieder draußen in der Linienstraße.
    Wir durchsuchten noch einmal das Arbeitszimmer [...].
                                                                   which could then be used as feature vectors in reg-
    Endlich erreichten wir den kleinen Antiquitätenladen.         ular classification. While the model indeed learned
                  Ich fuhr zur Linienstraße.                       a vector space in which sentences were more or
     Dann aber schlich ich mich in den dunklen Hausflur.
                                                                   less sorted into two distinct clusters, these clusters
  Most false positive sentences mention time, char-                did not seem to capture a general understanding of
acters or location without explicitly signifying a                 the concept of scene borders. This is shown by the
change. This speaks for the assumption that the                    observation that gold standard scene borders from
model might have overgeneralized these signals:                    the trial set were sorted into both clusters when
           In der Nähe des schlesischen Bahnhofs.                 embedded by the model.
             ”Tom, was tust Du, mußte das sein!”
                     Bill lag wieder still.                           For this reason, gradient boosting was chosen
      Auch Tom lauschte und schien unschlüssig zu sein.           as a subsequent classification algorithm for its abil-
               Isaak Kornblum besaß Telephon.
                           Ich tat es.                             ity to isolate a subset of features which would still
                                                                   be able to separate classes well. Early stopping
   On the other hand, many of the false negatives
                                                                   was used during training, meaning that the model
contain similar signals. This puts the assumption
                                                                   was trained for 5000 iterations on the shared task
that the model might have overgeneralized upon
                                                                   training data and the iteration of the model which
such signals into question. Of course, one needs
                                                                   achieved best results on the trial data set was cho-
to consider that the majority of dimensions of the
                                                                   sen as final. This achieved comparably poor results
respective embeddings encode sentences from the
                                                                   with micro F1 scores of 0.16 for track 1 respec-
context of a particular target sentence. Given this
                                                                   tively 0.26 for track 2. Nonetheless, these results
fact in combination that with the observation that
                                                                   were sufficient for ranks 2/4 respectively 1/5 in the
false positives and false negatives share similar pat-
                                                                   two tracks.
terns, it seems very likely that these local context
sentences have played a major role for classifica-                    It is an interesting observation that my system
tion. The following utterances are examples for                    performs better for highbrow literature in spite of
false negatives:                                                   the fact that its training data consisted solely of
                                                                   dime novels as it contradicts the assumption of the
            Tom eilte jetzt die Treppe empor [...].
           Mein Weg ging über die Gartenmauer.                    authors that dime novels would be potentially eas-
       Dann verschwand er lautlos durch die Vordiele.              ier to deal with for participating systems compared
            Wir [...] verließen schnell den Laden.
               dann stieg er die Leiter empor.
                                                                   to highbrow literature. A possible explanation for
    Tom verschwand schnell durch die Verbindungstür [...].        this could lie in the more formal nature of high-
                                                                   brow literature which might result in more regu-
5     Conclusion & Outlook                                         larities that are useful for successful classification.
I presented my submission to the shared task on                    However, without further inspection, this remains
scene segmentation at KONVENS 2021, a system                       speculation.
aimed at segmenting German narrativew texts into                      Further work could be the optimization of the
distinct scenes, spans of text where character con-                architecture and training procedure of the contextu-
stellations, discourse- and story time, and locations              alized sentence embedding model presented in this
stay more or less the same. For its implementation,                paper. This might lead to improved downstream
the task was interpreted as a sentence in context                  training results. Moreover, as gradient boosting
classification task. For solving this task, I first                functions as feature-based learning algorithm, it
trained a neural model consisting of two German-                   could be an option to combine contextualized sen-
BERT networks, the sentence encoder and context                    tence embeddings with statistical and hand-crafted
encoder, which, in conjunction, predict contextual-                features for representing sentences in context. In
ized sentence embeddings. This was conducted in a                  general, it can be said that the problem is far from
twin network setup where triplets of two sentences                 solved as sugggested by the poor results. How-
and an according score were fed to a a linear layer                ever, the idea of learning contextualized sentence
responsible for predicting such an according score.                embeddings and the optimization of the according

                                                              47
training procedure could be a useful option to for               Liudmila Ostroumova Prokhorenkova, Gleb Gusev,
future work on the topic.                                          Aleksandr Vorobev, Anna Veronika Dorogush, and
                                                                   Andrey Gulin. 2018. Catboost: unbiased boost-
                                                                   ing with categorical features. In Advances in Neu-
                                                                   ral Information Processing Systems 31: Annual
References                                                         Conference on Neural Information Processing Sys-
David M. Blei, Andrew Y. Ng, and Michael I. Jordan.                tems 2018, NeurIPS 2018, December 3-8, 2018,
  2003. Latent dirichlet allocation. Journal of Ma-                Montréal, Canada, pages 6639–6649.
  chine Learning Research, 3(4-5):993–1022.
                                                                 Nils Reimers and Iryna Gurevych. 2019. Sentence-
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and                      BERT: Sentence embeddings using Siamese BERT-
   Kristina Toutanova. 2019. BERT: Pre-training of                 networks. In Proceedings of the 2019 Conference on
   deep bidirectional transformers for language under-             Empirical Methods in Natural Language Processing
   standing. In Proceedings of the 2019 Conference                 and the 9th International Joint Conference on Natu-
   of the North American Chapter of the Association                ral Language Processing (EMNLP-IJCNLP), pages
   for Computational Linguistics: Human Language                   3982–3992, Hong Kong, China. Association for
  Technologies, Volume 1 (Long and Short Papers),                  Computational Linguistics.
   pages 4171–4186, Minneapolis, Minnesota. Associ-
   ation for Computational Linguistics.                          Martin Riedl and Chris Biemann. 2012. TopicTiling:
                                                                  A text segmentation algorithm based on LDA. In
Marti A. Hearst. 1997. Text tiling: Segmenting text               Proceedings of ACL 2012 Student Research Work-
 into multi-paragraph subtopic passages. Computa-                 shop, pages 37–42, Jeju Island, Korea. Association
 tional Linguistics, 23(1):33–64.                                 for Computational Linguistics.

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A                   Thomas Wolf, Lysandre Debut, Victor Sanh, Julien
  method for stochastic optimization. In 3rd Inter-                Chaumond, Clement Delangue, Anthony Moi, Pier-
  national Conference on Learning Representations,                 ric Cistac, Tim Rault, Remi Louf, Morgan Funtow-
  ICLR 2015, San Diego, CA, USA, May 7-9, 2015,                    icz, Joe Davison, Sam Shleifer, Patrick von Platen,
  Conference Track Proceedings.                                    Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu,
                                                                   Teven Le Scao, Sylvain Gugger, Mariama Drame,
Jiwei Li and Dan Jurafsky. 2017. Neural net models                 Quentin Lhoest, and Alexander Rush. 2020. Trans-
   of open-domain discourse coherence. In Proceed-                 formers: State-of-the-art natural language process-
   ings of the 2017 Conference on Empirical Methods                ing. In Proceedings of the 2020 Conference on Em-
   in Natural Language Processing, pages 198–209,                  pirical Methods in Natural Language Processing:
   Copenhagen, Denmark. Association for Computa-                   System Demonstrations, pages 38–45, Online. Asso-
   tional Linguistics.                                             ciation for Computational Linguistics.

Ilya Loshchilov and Frank Hutter. 2019. Decou-                   Albin Zehe, Leonard Konle, Lea Katharina
   pled weight decay regularization. In 7th Inter-                 Dümpelmann, Evelyn Gius, Andreas Hotho,
   national Conference on Learning Representations,                Fotis Jannidis, Lucas Kaufmann, Markus Krug,
   ICLR 2019, New Orleans, LA, USA, May 6-9, 2019.                 Frank Puppe, Nils Reiter, Annekea Schreiber, and
   OpenReview.net.                                                 Nathalie Wiedmer. 2021a. Detecting scenes in
                                                                   fiction: A new segmentation task. In Proceedings of
Llew Mason, Jonathan Baxter, Peter Bartlett, and Mar-              the 16th Conference of the European Chapter of the
  cus Frean. 1999. Boosting algorithms as gradient de-             Association for Computational Linguistics: Main
  scent. In Proceedings of the 12th International Con-             Volume, pages 3167–3177, Online. Association for
  ference on Neural Information Processing Systems,                Computational Linguistics.
  NIPS’99, page 512–518, Cambridge, MA, USA.
  MIT Press.                                                     Albin Zehe, Leonard Konle, Svenja Guhr, Lea Katha-
                                                                   rina Dümpelmann, Evelyn Gius, Andreas Hotho, Fo-
Hemant Misra, François Yvon, Olivier Cappé, and Joe-             tis Jannidis, Lucas Kaufmann, Markus Krug, Frank
  mon Jose. 2011. Text segmentation: A topic model-                Puppe, Nils Reiter, and Annekea Schreiber. 2021b.
  ing perspective. Information Processing & Manage-                Shared task on scene segmentation@konvens2021.
  ment, 47(4):528–544.                                             In Shared Task on Scene Segmentation.

Karl Pearson. 1901. LIII. on lines and planes of clos-
  est fit to systems of points in space. The London,
  Edinburgh, and Dublin Philosophical Magazine and
  Journal of Science, 2(11):559–572.

Karl Pichotta and Raymond J. Mooney. 2016. Learn-
  ing statistical scripts with lstm recurrent neural net-
  works. In Proceedings of the Thirtieth AAAI Con-
  ference on Artificial Intelligence, AAAI’16, page
  2800–2806. AAAI Press.

                                                            48