<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Second Conference on Computational Humanities Research, November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Predicting Structural Elements in German Drama</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Janis Pagel</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nidhi Sihag</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nils Reiter</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Digital Humanities, University of Cologne</institution>
          ,
          <addr-line>Albertus-Magnus-Platz, 50931 Cologne</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute for Natural Language Processing, University of Stuttgart</institution>
          ,
          <addr-line>Pfafenwaldring 5b, 70569 Stuttgart</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Stuttgart</institution>
          ,
          <addr-line>Keplerstrasse 7, 70174 Stuttgart</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>1</volume>
      <fpage>7</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>We address the challenge of enriching plain text dramas with predicted TEI/XML elements. We use a large corpus of dramas annotated with TEI information about act/scene changes, speaker changes, and stage directions, among others. On this data, we fine-tune a pre-trained BERT transformer model on several subtasks, like predicting stage directions vs. utterances. We show that the used architecture is able to predict the learned structural elements on unseen data for several settings and models.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;TEI</kwd>
        <kwd>Text Segmentation</kwd>
        <kwd>Dramatic Texts</kwd>
        <kwd>Computational Literary Studies</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>ACT 1
Scene 1
Enter Barnardo and Francisco, two sentinels.</p>
      <p>BARNARDO Who’s there?
FRANCISCO Nay, answer me. Stand and
unfold yourself.</p>
      <p>BARNARDO Long live the King!
FRANCISCO Barnardo.</p>
      <p>…
More generally, automatically predicting text structure from plain texts ofers new ways to
use texts, as layout, text formatting or structure play an important role in many historic text
types and uses of them in computational humanities research. It is important to note that we
do not see this work as editorial work or a form of interpreting the text by inserting markup.
Instead, our goal is to create machine readable markup of structures that are already overtly
in the text and can later be used for computational analyses.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        While there are many attempts of automatically segmenting texts into smaller and cohesive
units, many of these works focus on recognizing discourse units [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] and do not predict
structural elements suitable for markup. On the other hand, approaches that deal with structure
prediction, e.g. for deriving tables of contents [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] normally operate on a global level and do
not consider local structures (such as stage directions or speaker designations).
      </p>
      <p>
        Most recently, McConnaughey, Dai, and Bamman [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] show results of diferent models for
recognizing segments of 1,055 historical and OCRed books. They test a conditional random
ifelds (CRF) model, a random forest model and a bi-directional LSTM (Long Short-Term
Memory) architecture on the task of detecting labels such as title, table of contents or appendix
by using a feature space consisting of features such as detected keywords, alphabetical order
of page contents or density of characters on a page. They operate on a page-level, but allow
single pages to contain more than one label. They find the LSTM model to perform best and
the two sequence based models (LSTM and CRF) to perform better than the random forest
model; however, they also note that the diferences in the results is not significant and that
there is no apparent advantage of using a sequence based model over models that look at each
instance in isolation.
      </p>
      <p>
        Pethe, Kim, and Skiena [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] work on 9,126 books from Project Gutenberg and aim at
identifying chapter boundaries by first creating ground truth data out of the Gutenberg documents
and then using this data as a training and evaluation basis. They create the silver-standard
ground truth dataset by using a pre-trained BERT model that retrieves a list of potential
chapter heading candidates and filter this list using over 1,000 diferent regular expression
rules. After removing the chapter headings from the data, they test diferent models like a
ifne-tuned BERT model to predict the breakpoints. The BERT model using a context window
of 254 words performs best on all evaluation tasks and metrics.
      </p>
      <p>
        Zehe et al. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] present a dataset of scene annotations for literary prose texts. Their definition
of scene is mostly based on narratological assumptions and hence they distinguish scenes
and non-scenes. They also present preliminary experiments on automatically detecting these
segmentations using a stock BERT model. They point out that using a non-fine-tuned BERT
model is not sufficient for solving the task. They specifically notice that the model cannot
detect the beginning of scenes after non-scene episodes. As they also point out, non-scenes
can only seldom be found in dramatic texts. We share their approach in classifying on a
sentence-level and diferentiating between diferent types of segmentation.
      </p>
      <p>To our knowledge, there is no research on the automatic segmentation of dramas.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Model</title>
      <p>We make use of a Bidirectional Encoder Representations from Transformers (BERT)
architecture as our base model. BERT is a large neural network architecture, with a number of
parameters that can range from 100 million to over 300 million.</p>
      <p>It is usually preferable to use a pre-trained BERT model that was trained on a huge dataset.
For the following experiments, we use models that were trained on English and German data.
While the English models are trained on a diferent language than that of the dramatic texts, it
allows us to show how much the model relies on structural compared to semantic information.</p>
      <sec id="sec-3-1">
        <title>3.1. Architecture</title>
        <p>Our models are based on the pre-trained BERT model. It consists of 12 layers with a deep
selfattention mechanism. We feed the BERT model with the tokenized sentences which contain
the input ids and attention mask. The output of the BERT model is passed on to a dropout
layer (rate: 10 %) so that we can prevent our model from overfitting. The output of the
dropout layer is passed on to a rectified linear activation function to overcome the vanishing
gradient problem and allow our model to learn faster and perform better.</p>
        <p>After that, two dense layers are used. The first layer has 768 states equal to the number of
hidden states of the pre-trained BERT model. The layer is followed by a second dense layer
with a softmax activation function.</p>
        <p>During fine-tuning, we freeze all the layers of the model, to prevent any updating of model
weights.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Data</title>
      <p>
        We are using the German Drama Corpus (GerDraCor), a collection of German-language plays
from 1730 to the 1940s [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. These files are encoded with TEI 2 tags that represent structural
properties of the plays. The following tags are relevant to our task: &lt;div type="act"&gt;
indicates whether a new scene or act is starting; &lt;head&gt;I. Akt&lt;/head&gt; indicates the number of
the act or scene; &lt;stage&gt; contains all the stage directions like tall, strong man with a coal-black
beard, throws the dice with a great noise; &lt;speaker&gt; contains the name of the character that
is currently speaking; and finally &lt;p&gt; and &lt;l&gt;, which contain the text that is spoken.
      </p>
      <p>We use single sentences as input for the BERT model. This is a compromise, as some
tags will only cover single words or incomplete sentences. Using whole sentences as input has
advantages though, as it gives more context to the model and it is straightforward to later
assign XML tags for sentences or groups of sentences instead of arbitrary sub-sequences.</p>
      <p>Some tags will also have multiple lines of text stored inside them. We therefore use the
NLTK Sentence Tokenizer to split groups of multiple sentences into single sentences, which
can be given as input to our model. The extracted plain text is passed to the tokenizer. The
text coming from inside the speaker tags is also considered as a sentence, even though it usually
just constitutes a name. Figure 2 shows a part of our final pre-processed dataset which we
use for training and testing. While the first column SENTENCE contains all the tokenized
sentences, the second column Decider contains numeric class labels for the tokenized sentences.</p>
      <p>Overall, the dataset contains 1 410 783 sentences with 10 021 598 tokens and 240 794 types.
Since the sentences in the dataset are of varying length, we use padding to make all sequences
have the same length. Since the vast majority of sentences has ten or less tokens, we set the
maximal sequence length to 10.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Experiments</title>
      <sec id="sec-5-1">
        <title>5.1. Baseline</title>
        <p>We implement a simple baseline to compare the results of the transformer models against. For
the baseline, we choose a conditional random fields (CRF) model, which is able to consider
sequential information. To make the baseline comparable to the BERT models, we also choose
sentences as the input and let the model predict if a sentence belongs to one of five classes: act
(0), scene (1), stage direction (3), speaker tag (4) or utterance (5). The CRF receives features
extracted from each sentence, namely:
• The lower-cased surface string of the sentence.
• If the sentence contains the German word ‘Akt’.
• If the sentence contains the German words ‘Szene’ or ‘Scene’.
• If the sentence begins with an uppercase letter.
• If the sentence only contains uppercase letters.
• If the sentence contains a digit.</p>
        <p>For training, we make use of the limited-memory BFGS algorithm and elastic net
regularization.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Experimental Setup</title>
        <p>
          Training For the BERT models, we use pre-trained models provided by HuggingFace3 to
ifne-tune on. We use the AdamW algorithm [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] which is an improved version of Adam to train,
with a batch size of 256 and we clip the norm of the gradients at 1, as an extra safety measure
against exploding gradients. The model is implemented in PyTorch [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and scikit-learn [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
We use negative log likelihood as loss function, and apply a learning rate equal to 2e−5. The
training runs for 20 epochs.
Class Weights Table 1 shows how the classes are distributed among the sentences. Some
classes have much more training examples than others, introducing bias in our models. To
deal with this problem, we apply class weights to the loss function. These are computed as
the inverse frequency of the classes in the training set.
        </p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Evaluation</title>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Results</title>
        <p>We consider accuracy, precision, recall and F1-score as metrics.</p>
        <p>In this section, we investigate the performance of our proposed model on various tasks. We
split the dataset randomly into three sets: train, validation, and test, where the train set is
70% and validation and test set are both 15% of the overall data. We fine-tune the model
using the train and validation set, and evaluate on the test set.</p>
        <p>Detecting Act Boundaries We extract the data from &lt;stage&gt;, &lt;head&gt; and &lt;speech&gt;. The
sentence splitter recognizes 1 208 899 sentences. The goal of the prediction is to mark all
sentences: If the sentence is the first sentence of an act it is classified as 1, otherwise as
0. Hence it is a binary classification task. For this task we use ’bert-base-uncased’ (the
identifier at HuggingFace) as a base model. In Table 2 we can see the results as classification
report and confusion matrix.</p>
        <p>We can see that the model is able to predict a non-boundary for nearly 100 % of the cases.
Yet, the model is not overfitting on class 0, as the prediction for an act boundary still gets
high scores with an F1-score of 0.89. From Table 2b we can see that the model makes more
mistakes in wrongly classifiying non-act-boundaries as act changes than the other way around
(i.e., the number of false positives is higher than the number of false negatives).</p>
        <p>Detecting Stage Directions For this task we use the &lt;stage&gt; and &lt;speech&gt; tags, which
contain 1 203 911 sentences. Each sentence is classified as to whether it is (part of a) stage
direction or not, using the ’bert-base-uncased’ model. As stage directions are much more
frequent and much more similar to character speech, we consider this task to be more difficult
than the one discussed above: Instead of relying on lexical cues, it needs to take discourse
structure and semantic information into account.</p>
        <p>Table 3 shows the results. In this, 0 represents character speech and 1 represents stage
direction. The model has indeed more difficulties with correctly predicting stage directions.
With a precision of 0.7 and recall of 0.95, the detection nevertheless performs reasonably well.
The confusion matrix shows that false positives are much more common than false negatives,
which can probably be explained by the imbalance in the training dataset.</p>
        <p>238364 (b) Confusion Matrix. GS is the gold standard, SO the
system output.
1</p>
        <p>238364 (b) Confusion Matrix. GS is the gold standard, SO the
system output.</p>
        <p>All tasks combined For this task, we extract the data from the all above mentioned tags,
which in total contain 1 589 090 sentences. The task now is a 5-way classification, as we classify
sentences as being (part of) a stage direction (2), name of a speaker (3), character speech (4)
or act (0) or scene boundary (1).</p>
        <p>For this task we use diferent types of BERT models and compare them. Table 4 shows
results for ’bert-base-uncased’. All results are still comparable to the results of classifying
the tags individually. Some of the results are lower, but not by much. This is promising,
as it shows that we can potentially predict the complete structure of a plain text drama at
once without loosing much in predictive power over classifying the single types of structure
individually.</p>
        <p>As mentioned earlier, all models so far have been pre-trained on English data. The above
evaluation shows that even on German data, they can make good predictions, which can be
explained by the fact that most of the distinguishing features needed so far for prediction are
structural rather than content-based. However, for the task of predicting all tags together,
we now use a model trained on German data and see if the results can be further improved.
Table 5 shows the results for applying the ’bert-base-german-uncased’ model. We can
see that especially for predicting stage directions, the performance improves significantly by 7
percentage points F1 score. The other results are either identical or slightly higher in the case</p>
        <p>238364 (b) Confusion Matrix. GS is the gold standard, SO the
system output.
1</p>
        <p>238364 (b) Confusion Matrix. GS is the gold standard, SO
the system output.
of speech with a plus of 2 percentage points. This is absolutely expected, as these two types
are more content based. Still, the English model is able to pick up on enough structural cues
to also predict well on German data.</p>
        <p>Lastly, we check if it makes a diference to use a model that was trained on cased data, as
all other models before were trained on uncased data. Here, the ’bert-base-german-cased’
model has been used. The results for this can be found in Table 6, and are sightly lower than
in the uncased setting. This suggests that preserving case lets the model generalize less well.
Baseline We compare these final results to the baseline system. The results are shown in
Table 7. The baseline performs rather well for the tasks of predicting act and scene boundaries
and recognizing speaker tags. However, the BERT-based models achieve slightly higher values
for all these classes. For the task of character speech identification, it performs worse than the
BERT-based models in term of precision, but achieves a higher recall than all other models.
For the crucial task of recognizing stage directions, it returns a rather low recall value, but the
highest precision value.</p>
      </sec>
      <sec id="sec-5-5">
        <title>5.5. Summary of the results</title>
        <p>In all experiments, we observe that the models achieve precision and recall scores around 95 %
to 99 % for most of the categories. For stage directions, the evaluation yields lower scores:
The model misclassifies some of the sentences in character speech as stage directions. By
experimenting with diferent BERT models we are able to achieve a precision of 77 % for stage
directions which means that BERT German Uncased is the most suitable model for these
predictions. While the CRF-based model sets a high baseline for the tasks of act, scene and
speaker recognition, the BERT-based models outperform the baseline in all measures. Only
for the content-based tasks of speech and stage direction recognition, the baseline achieves
higher results in recall and precision, respectively. In future work, the transformer models
might benefit from combining them with the CRF model.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future work</title>
      <p>In this paper, we have shown that the BERT model is a reasonable model for predicting
and extracting structural segments from dramatic texts. Based on this finding, we have
proposed a novel fine-tuned model based on BERT. From the above results we can conclude that
’BERT_German_Uncased’ is the most efective base model. We can also conclude from
the above results that all the models perform quite well, whether we predict the segments with
binary classification or in the full model with five classes. We were further able to show that
models trained on English data are able to predict the more structural elements of German
dramatic texts with high accuracy. However, for the structural elements that rely more on
text content, a model trained on German data performs better.</p>
      <p>Both recall and precision for all classes except class 2 (stage directions), are quite high which
means that the model predicts these classes accurately. The recall for class 2 is 0.93 which
means that the model was able to find 93 % of the stage direction sentences. However, precision
is a bit lower for class 2, which means that the model misclassifies some of the class 4 sentences
(character speech) as stage.</p>
      <p>In the future, we plan to extend on the presented work to create a fully automatic
mapping tool to convert plain text scans of dramatic texts into properly structured TEI/XML
documents. Even if this automatic conversion is likely to contain some errors, correcting it
manually is much less labor-intensive than coding the entire play by hand. We plan to add
texts currently only available in plain text to the DraCor corpora once the above mentioned
tool is developed and functioning. One challenge we will most likely face will be that OCRed
texts usually contain mistakes which might throw of the transformer model. Hence we will
also experiment with text normalization techniques. This opens a path towards large scale
data analysis of plays that currently are not available as part of the DraCor repository. In
addition, our analysis has shown that the trained model works reasonably well even if used
across language boundaries. This suggests that it is also possible to apply a very similar model
on plays from other languages, as training data for many languages is already available.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>The first and third author have conducted the described research within the QuaDramA
project, funded by the Volkswagen foundation and within the Q:TRACK project, funded by
the German Research Foundation (DFG) in the context of SPP 2207 Computational Literary
Studies. We thank both for making this possible.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Doucet</surname>
          </string-name>
          , G. Kazai,
          <string-name>
            <given-names>S.</given-names>
            <surname>Colutto</surname>
          </string-name>
          , and
          <string-name>
            <surname>G. Mühlberger.</surname>
          </string-name>
          “
          <article-title>Overview of the ICDAR 2013 Competition on Book Structure Extraction”</article-title>
          .
          <source>In: Proceedings of the Twelfth International Conference on Document Analysis and Recognition (ICDAR)</source>
          . Washington,
          <string-name>
            <given-names>D.C.</given-names>
            ,
            <surname>US</surname>
          </string-name>
          ,
          <year>2013</year>
          , pp.
          <fpage>1438</fpage>
          -
          <lpage>1443</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Börner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Göbel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hechtl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Kittel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Milling</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Trilcke</surname>
          </string-name>
          . “Programmable Corpora:
          <article-title>Introducing DraCor, an Infrastructure for the Research on European Drama”</article-title>
          .
          <source>In: Proceedings of DH2019: ”Complexities”. Utrecht</source>
          , The Netherlands,
          <year>2019</year>
          . doi:
          <volume>10</volume>
          .5281/zenodo.4284002.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Hearst</surname>
          </string-name>
          . “
          <article-title>TextTiling: Segmenting Text into Multi-paragraph Subtopic Passages”</article-title>
          .
          <source>In: Computational Linguistics 23.1</source>
          (
          <issue>1997</issue>
          ), pp.
          <fpage>33</fpage>
          -
          <lpage>64</lpage>
          . url: https://www.aclweb.org/ anthology/J97-1003.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A. K.</given-names>
            <surname>John</surname>
          </string-name>
          , L. Di
          <string-name>
            <surname>Caro</surname>
            , and
            <given-names>G. Boella.</given-names>
          </string-name>
          “
          <article-title>Text Segmentation with Topic Modeling and Entity Coherence”</article-title>
          .
          <source>In: Proceedings of the 16th International Conference on Hybrid Intelligent Systems (HIS)</source>
          . Ed. by
          <string-name>
            <given-names>A.</given-names>
            <surname>Abraham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Haqiq</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Alimi</surname>
          </string-name>
          , G. Mezzour,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rokbani</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. K.</given-names>
            <surname>Muda</surname>
          </string-name>
          . Vol.
          <volume>552</volume>
          .
          <source>Advances in Intelligent Systems and Computing (AISC)</source>
          . Springer,
          <year>2017</year>
          , pp.
          <fpage>175</fpage>
          -
          <lpage>185</lpage>
          . doi:
          <volume>10</volume>
          . 1007 / 978 - 3 -
          <fpage>319</fpage>
          - 52941 - 7 \ _18. url: https://link.springer.com/chapter/10.1007%5C%
          <fpage>2F978</fpage>
          -
          <fpage>3</fpage>
          -
          <fpage>319</fpage>
          -52941-7%5C%
          <fpage>5F18</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>I.</given-names>
            <surname>Loshchilov</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          . “
          <article-title>Decoupled Weight Decay Regularization”</article-title>
          .
          <source>In: International Conference on Learning Representations</source>
          .
          <year>2019</year>
          . url: https://openreview.net/forum?id=
          <fpage>Bkg6RiCqY7</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>McConnaughey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dai</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Bamman</surname>
          </string-name>
          . “
          <article-title>The Labeled Segmentation of Printed Books”</article-title>
          .
          <source>In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          . Copenhagen, Denmark,
          <year>2017</year>
          , pp.
          <fpage>737</fpage>
          -
          <lpage>747</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>D17</fpage>
          - 1077. url: https://aclanthology.org/D17-1077.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Paszke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gross</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Massa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lerer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bradbury</surname>
          </string-name>
          , G. Chanan,
          <string-name>
            <given-names>T.</given-names>
            <surname>Killeen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Gimelshein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Antiga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Desmaison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kopf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>DeVito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Raison</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tejani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chilamkurthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Steiner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bai</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Chintala</surname>
          </string-name>
          . “
          <article-title>PyTorch: An Imperative Style, High-Performance Deep Learning Library”</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          <volume>32</volume>
          . Ed. by
          <string-name>
            <given-names>H.</given-names>
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Larochelle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Beygelzimer</surname>
          </string-name>
          , F.
          <string-name>
            <surname>d'Alché- Buc</surname>
            , E. Fox, and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Garnett</surname>
          </string-name>
          . Curran Associates, Inc.,
          <year>2019</year>
          , pp.
          <fpage>8024</fpage>
          -
          <lpage>8035</lpage>
          . url: http: //papers.neurips.cc/paper/9015-pytorch
          <article-title>-an-imperative-style-high-performance-deeplearning-library</article-title>
          .pdf.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , and
          <string-name>
            <surname>E. Duchesnay.</surname>
          </string-name>
          “
          <article-title>Scikit-learn: Machine Learning in Python”</article-title>
          .
          <source>In: Journal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <year>2011</year>
          ), pp.
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Pethe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kim</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Skiena</surname>
          </string-name>
          . “Chapter Captor:
          <article-title>Text Segmentation in Novels”</article-title>
          .
          <source>In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          .
          <year>2020</year>
          , pp.
          <fpage>8373</fpage>
          -
          <lpage>8383</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2020</year>
          .emnlp- main.672. url: https: //aclanthology.org/
          <year>2020</year>
          .emnlp-main.
          <volume>672</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pfister</surname>
          </string-name>
          .
          <source>The Theory and Analysis of Drama. Trans</source>
          . by
          <string-name>
            <given-names>J.</given-names>
            <surname>Halliday</surname>
          </string-name>
          .
          <source>European Studies in English Literature</source>
          . Cambridge: Cambridge University Press,
          <year>1988</year>
          . doi:
          <volume>10</volume>
          . 1017 / cbo9780511553998.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Zehe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Konle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dümpelmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Gius</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hotho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Jannidis</surname>
          </string-name>
          , L. Kaufmann,
          <string-name>
            <given-names>M.</given-names>
            <surname>Krug</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Puppe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Reiter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Schreiber</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Wiedmer</surname>
          </string-name>
          . “
          <article-title>Detecting Scenes in Fiction: A new Segmentation Task”</article-title>
          .
          <source>In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics:</source>
          Main Volume.
          <source>Association for Computational Linguistics</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>3167</fpage>
          -
          <lpage>3177</lpage>
          . url: https : / / www . aclweb . org / anthology/2021.eacl-main.
          <volume>276</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>