Point Break: Surfing Heterogeneous Data for Subtitle Segmentation

                          Alina Karakanta1,2 , Matteo Negri1 , Marco Turchi1
                   1
                     Fondazione Bruno Kessler, Via Sommarive 18, Povo, Trento - Italy
                                      2
                                        University of Trento, Italy
                             {akarakanta,negri,turchi}@fbk.eu


                         Abstract                               proving the reading skills of children and immi-
                                                                grants (Gottlieb, 2004). Having such a large pool
        Subtitles, in order to achieve their pur-               of users and covering a wide variety of functions,
        pose of transmitting information, need to               subtitling is probably the most dominant form of
        be easily readable. The segmentation of                 Audiovisual Translation.
        subtitles into phrases or linguistic units is              Subtitles, however, in order to fulfil their pur-
        key to their readability and comprehen-                 poses as described above, need to be presented
        sion. However, automatically segmenting                 on the screen in a way that facilitates readability
        a sentence into subtitles is a challenging              and comprehension. Bartoll and Tejerina (2010)
        task and data containing reliable human                 claim that subtitles which cannot be read or can be
        segmentation decisions are often scarce.                read only with difficulty ‘are almost as bad as no
        In this paper, we leverage data with noisy              subtitles at all’. Creating readable subtitles comes
        segmentation from large subtitle corpora                with several challenges. The difficulty imposed by
        and combine them with smaller amounts                   the transition to a different semiotic means, which
        of high-quality data in order to train mod-             takes place when transcribing or translating the
        els which perform automatic segmentation                original audio into text, is further exacerbated by
        of a sentence into subtitles. We show that              the limitations of the medium (time and space on
        even a minimum amount of reliable data                  screen). Subtitles should not exceed a maximum
        can lead to readable subtitles and that qual-           length, usually ranging between 35-46 characters,
        ity is more important than quantity for the             depending on screen size and audience age or pref-
        task of subtitle segmentation.1                         erences. They should also be presented at a com-
                                                                fortable reading speed for the viewer. Moreover,
1       Introduction                                            chucking or segmentation, i.e. the way a subtitle is
In a world dominated by screens, subtitles are a                split across the screen, has a great impact on com-
vital means for facilitating access to information              prehension. Studies have shown that a proper seg-
for diverse audiences. Subtitles are classified as              mentation can balance gazing behaviour and subti-
interlingual (subtitles in a different language as              tle reading (Perego, 2008; Rajendran et al., 2013).
the original video) and intralingual (of the same               Each subtitle should – if possible – have a logical
language as the original video) (Bartoll, 2004).                completion. This is equivalent to a segmentation
Viewers normally resort to interlingual subtitles               by phrase, sentence or unit of information. Where
because they do not speak the language of the                   and if to insert a subtitle break depends on sev-
original video, while intralingual subtitles (also              eral factors such as speech rhythm, pauses but also
called captions) are used by people who cannot                  semantic and syntactic properties. This all makes
rely solely on the original audio for comprehen-                segmenting a full sentence into subtitles a complex
sion. Such viewers are, for example, the deaf and               and challenging problem.
hard of hearing and language learners. Apart from                  Developing automatic solutions for subtitle seg-
creating a bridge towards information, entertain-               mentation has long been impeded by the lack of
ment and education, subtitles are a means to im-                representative data. Line breaks are the new lines
    1
                                                                inside a subtitle block, which are used to split
     Copyright c 2020 for this paper by its authors. Use per-
mitted under Creative Commons License Attribution 4.0 In-       a long subtitle into two shorter lines. This type
ternational (CC BY 4.0).                                        of breaks is not present in the subtitle files used
to create large subtitling corpora such as Open-         mine subtitle breaks (Álvarez et al., 2014). Ex-
Subtitles (Lison and Tiedemann, 2016) and cor-           tending this work, Álvarez et al. (2017) trained
pora based on TED Talks (Cettolo et al., 2012;           a Conditional Random Field (CRF) classifier for
Di Gangi et al., 2019), possibly because of en-          the same task, but in this case making a distinc-
coding issues and the pre-processing of the sub-         tion between line breaks (next subtitle line) and
titles into parallel sentences (Karakanta et al.,        subtitle breaks (next subtitle block). A more re-
2019). Recently, MuST-Cinema (Karakanta et al.,          cent, neural-based approach (Song et al., 2019)
2020b), a corpus based on TED Talks, was re-             employed a Long-Short Term Memory Network
leased, which added the missing line breaks from         (LSTM) to predict the position of the period in
the subtitle files (.srt2 ) using an automatic annota-   order to improve the readability of automatically
tion procedure. This makes MuST-Cinema a high-           generated Youtube captions, but without focusing
quality resource for the task of subtitle segmenta-      specifically on the segmentation of subtitles. Fo-
tion. However, the size of MuST-Cinema (about            cusing on the length constraint, Liu et al. (2020)
270k sentences) might not be sufficient for devel-       proposed adapting an Automatic Speech Recogni-
oping automatic solutions based on data-hungry           tion (ASR) system to incorporate transcription and
neural-network approaches, and its language cov-         text compression, with a view to generating more
erage is so far limited to 7 languages. On the           readable subtitles.
other hand, the OpenSubtitles corpus, despite be-            A recent line of works has paved the way for
ing rather noisy, constitutes a large resource of        Neural Machine Translation systems which gen-
subtitling data.                                         erate translations segmented into subtitles, here
    In this work, we leverage available subtitling       in a bilingual scenario. Matusov et al. (2019)
resources in different resource conditions to train      customised an NMT system to subtitles and in-
models which automatically segment sentences             troduced a segmentation module based on hu-
into readable subtitles. The goal is to exploit the      man segmentation decisions trained on OpenSub-
advantages of the available resources, i.e. size         titles and penalties well established in the subti-
for OpenSubtitles and quality for MuST-Cinema,           tling industry. Karakanta et al. (2020a) were the
for maximising segmentation performance, but             first to propose an end-to-end solution for Speech
also taking into account training efficiency and         Translation into subtitles. Their findings indicated
cost. We experiment with a sequence-to-sequence          the importance of prosody, and more specifically
model, which we train and fine-tune on different         pauses, to achieving subtitle segmentation in line
amounts of data. More specifically, we hypoth-           with the speech rhythm. They further confirmed
esise the condition where data containing high-          the different roles of line breaks (new line inside a
quality segmentation decisions is scarce or non-         subtitle block) and subtitle block breaks (the next
existent and we resort to existing resources (Open-      subtitle appears on a new screen); while block
Subtitles). We show that high-quality data, repre-       breaks depend on speech rhythm, line breaks fol-
sentative of the task, even in small amounts, are a      low syntactic patterns. All this shows that subtitle
key to finding the break points for readable subti-      segmentation is a complex and dynamic process
tles.                                                    and depends on several and varied factors.
2       Related work
                                                         3   Methodology
Automatically segmenting text into subtitles has
long been addressed as a post-processing step in
                                                         This section describes the data processing, model
a translation/transcription pipeline. In industry,
                                                         and evaluation used for the experiments. All ex-
language-specific rules and simple algorithms are
                                                         periments are run for English, as the language
employed for this purpose. Most academic ap-
                                                         with the largest amount of available resources, but
proaches on subtitle segmentation make use of
                                                         the approach is easily extended to all languages.
a classifier which predicts subtitle breaks. One
                                                         Note that here we are focusing on a monolingual
of these approaches used Support Vector Ma-
                                                         scenario, where subtitle segmentation is seen as
chine and Logistic Regression classifiers on cor-
                                                         a sequence-to-sequence task of passing from En-
rectly/incorrectly segmented subtitles to deter-
                                                         glish sentences without break symbols to English
    2
        http://zuggy.wz.cz/                              sentences containing break symbols.
3.1   Data                                                                    Data               Sents
As training data we use MuST-Cinema and Open-                            MuST-Cinema          275,085
Subtitles. MuST-Cinema contains special symbols                          OpenSubs-42          185,758
to indicate the breaks: <eob> for subtitle breaks                        OpenSubs-48       13,713,708
and <eol> for line breaks inside a subtitle block.
We train models using all data (MC-all) and only                     Table 1: Dataset sizes in sentences.
100k sentences (MC-100).3
    The monolingual files for OpenSubtitles come              set is the English test set released with MuST-
in XML format, where each subtitle block form-                Cinema, containing 10 single-speaker TED Talks
ing a sentence is wrapped in XML tags. We are                 (545 sentences). The second test set (782 sen-
therefore able to insert the <eob> symbols for de-            tences) is much more diverse. In order to create it,
termining the end of a subtitle block. However,               we have selected a mix of public and proprietary
we mentioned that line breaks are not present in              data, more specifically, excerpts from a TV series,
OpenSubtitles. We hence proceed to creating ar-               a documentary, two short interviews and one ad-
tificial annotations for <eol>. We filter all sen-            vertising video. The subtitling was performed by
tences for which all subtitles have a maximum                 professional translators and the .srt files were pro-
length of 42 characters (OpenSubs-42). Then, for              cessed to insert the break symbols in the positions
each <eob>, we substitute it with <eol> with a                where subtitle and line breaks occur.
probability of 0.25, making sure to avoid having
two consecutive <eol>, as this would lead to a                3.2   Model
subtitle of three lines, which occupies too much              The model is a sequence-to-sequence model based
space on the screen. Since this length constraint             on the Transformer architecture (Vaswani et al.,
results in filtering out a lot of data, we also re-           2017), trained using fairseq (Ott et al., 2019) with
lax the length constraint by allowing sentences               the same settings as in Karakanta et al. (2020b). It
with subtitles with up to 48 characters (OpenSubs-            takes as input a full sentence and returns the same
48). The motivation for this relaxation is that, if           sentence annotated with subtitle and line breaks.
a sequence-to-sequence model is not able to learn             We process the data into sub-word units with Sen-
the constraint of length from the data but instead            tencePiece (Kudo and Richardson, 2018) with 8K
learns segmentation decisions based on patterns               vocabulary size. The special symbols are kept as
of neighbouring words, having more data will in-              a single sub-word. Models were trained until con-
crease the amount and variety of segmentation de-             vergence, on 1 Nvidia GeForce GTX1080Ti GPU.
cisions observed by the model. This may result                   As baseline, we use a simple segmentation ap-
in more plausible segmentation, possibly though               proach inserting a break symbol at the first space
to the expense of length conformity. Dataset sizes            before every 42 characters. From the two types of
are reported in Table 1.                                      symbols, <eol> is selected with a 0.25 probabil-
    We are interested in the real application sce-            ity, but we avoid inserting two consecutive <eol>,
nario where high-quality data containing human                since this would lead to a subtitle of three lines.
segmentation decisions are not available or scarce.
According to our hypothesis, a relatively limited             3.3   Evaluation
size of high-quality data can be compensated by               Evaluating the subtitle segmentation is performed
OpenSubtitles. Therefore, we fine-tune each of the            with the following metrics. First, we compute the
OpenSubtitle models on 10k and 100k sentences                 precision, recall and F1-score between the output
from MuST-Cinema, which contain high-quality                  of the segmenter and the human generated sub-
break annotations.                                            titles in order to test the model’s performance at
    OpenSubtitles and TED Talks have been shown               inserting a sufficient number of breaks and at the
to have large differences and to constitute a sub-            right positions in the sentence. Additionally, we
classification of the subtitling genre (Müller and           compute the BLEU score (Papineni et al., 2002)
Volk, 2013). For this reason, we experiment with              between the output of the segmenter and the hu-
2 test sets for cross-domain evaluation. The first            man reference. Higher values for BLEU indicate
    3
      Training a model with 10k data did not bring good re-   a high similarity between the model’s and desired
sults.                                                        output.
    Model         BLEU    Prec   Rec   F1   CPL   Time   readable subtitles in terms of length in diverse test-
    baseline      55.30   50     47    48   100     -    ing conditions even without massive amounts of
    MC-all        84.00   85     85    85    96   305    data. Even with 100k of training data (MC-100)
    MC-100        81.77   84     83    83   94    210
    OpenSubs-42   72.24   86     66    73    74    270
                                                         the performance of the model, which is the fastest
          MC-10   77.99   83     76    79    88   +26    model to train, drops only slightly, with -2% for
         MC-100   80.09   87     78    81    88   +250   all metrics on the MuST-Cinema test set and -1%
    OpenSubs-48   76.00   77     67    68    72   6980
          MC-10   82.46   86     80    82    91   +240   on the second test set. This shows that high effi-
                                                         ciency can be achieved without dramatically sac-
Table 2: Results for the MuST-Cinema test set.           rificing quality. This is particularly important for
Training time in minutes.                                industry applications where tens of languages are
                                                         involved and training data for a domain might not
    Model         BLEU    Prec   Rec   F1   CPL   Time   be vast.
    baseline      51.45   46     43    44   100     -        The models trained only on OpenSubtitles show
    MC-all        66.38   72     64    69    97   305
    MC-100        65.38   76     64    68   96    210    a great drop in performance for the MuST-Cinema
    OpenSubs-42   61.41   84     56    65    79    270   test, which is to be expected because of the differ-
          MC-10   63.53   76     60    66    93   +26
         MC-100   65.3    77     62    67    94   +250
                                                         ent nature of the data. However, the drop is present
    OpenSubs-48   63.37   63     56    59    81   6980   also for the second test set, which shows that these
          MC-10   65.66   78     61    67    94   +240   models are not robust to different domains. Sur-
                                                         prisingly, the larger model (OpenSubs-48) does
Table 3: Results for the second test set. Training       not perform much better than the model with less
time in minutes.                                         data (OpenSubs-42) even though it is trained on
                                                         almost 10 times as much data. This could be an
   Finally, we want to check the performance of          indication of a trade-off between data quality and
the system in generating readable subtitles, there-      data size. OpenSubs-48 with more noisy data has
fore, we use an intrinsic, task-specific metric. We      similar recall to OpenSubs-42, but it is much less
compute the number of subtitles with a length of         accurate in the position of the breaks, as shown by
<= 42 characters (Characters per Line - CPL),            the drop in precision (86 vs. 77 and 84 vs. 63).
according to the TED subtitling guidelines. This         We conjecture that the procedure of artificially in-
shows the ability of the system to segment the sen-      serting <eol> symbols by changing the existing
tences into readable subtitles, by producing subti-      <eob> does not reflect the distribution of the type
tles that are not too long to appear on the screen.      of breaks in real data. Interestingly, the OpenSubs-
We additionally report training time, as efficiency      42 model, despite containing only subtitles of a
and cost are important factors for scaling such          maximum length of 42, is not able to generate sub-
methods to tens of languages.                            titles which respect the length constraint (74% and
                                                         79%). It is therefore possible that the segmenter
4     Results                                            does not learn to take into consideration the con-
                                                         straint of length, but the segmentation decisions
Tables 2 and 3 show the results for the MuST-
                                                         are based on lexical patterns in the data, as also
Cinema and the second test set respectively. As
                                                         suggested by Karakanta et al. (2020a).
expected, the simple baseline achieves a 100%
conformity to the length constraint, it is however          Fine-tuning, even on a minimum amount of real
not accurate in inserting the breaks at the right po-    data, as shown when fine-tuning on 10k of MuST-
sitions, as shown by the very low BLEU (55.30            Cinema, can significantly boost the performance
and 51.45) and F1 scores (48 and 44). The best           compared to the OpenSubtitles models and is a
performance for all metrics and both test sets is        viable and fast solution towards readable subti-
achieved when using all available MuST-Cinema            tles. This corroborates the claim in favour of
data (MC-all). For the in-domain test set, BLEU          creating datasets which are representative of the
and F1 are higher than for the out-of-domain test        task at hand. Surprisingly though, fine-tuning the
set, however the number of subtitles conforming          OpenSubs-42 model on MC-100 does not improve
to the length constraint is consistently high (96%       over training the model from scratch on MC-100
and 97%). This suggests that the systems trained         for neither test set. For the case when only a small
on high-quality segmentation are able to produce         amount of MuST-Cinema data is available (MC-
10), having a larger base model on which to fine-       tle in two, while MC-100k succeeds in segmenting
tune (OpenSubs-48) is beneficial, since there is an     all subtitles exceeding 42 characters, matching the
improvement for all metrics and in both testing         reference segmentation.
conditions compared to all other models trained
on OpenSubtitles or fine-tuned on them. There-            Reference:
fore, we conclude that, in the presence of little         Meditation is a technique <eol>
data containing human segmentation decisions, a           of finding well-being <eob>
model trained or more data, even though possibly          in the present moment <eol>
noisier, is a more robust base on which to fine-          before anything happens. <eob>
tune using the high-quality data. One consider-           OpenSubs-42:
able drawback is that the improvement comes at            Meditation is a technique of finding well-
a training time of x25 over the other base model          being <eob>
(OpenSubs-42), which raises significant consider-         in the present moment before anything hap-
ations for cost and efficiency. Such a model how-         pens. <eob>
ever, once trained, could be re-used for fine-tuning      (47+46 characters)
on several domains and for different client specifi-      OpenSubs-42 + MC 10K:
cations.                                                  Meditation is a technique <eol>
                                                          of finding well-being <eob>
5    Analysis and Discussion                              in the present moment before anything hap-
We further perform a manual inspection to iden-           pens. <eob>
tify issues related to the models. We hypothesise         (25+21+46 characters)
that low precision is connected to over-splitting         MC-100K: Meditation is a technique <eol>
or splitting in wrong positions, while low recall         of finding well-being <eob>
suggests under-splitting (not inserting a sufficient      in the present moment <eol>
number of breaks). Indeed, we observe that the            before anything happens. <eob>
OpenSubtitle models tend to over-segment short
sentences, but under-segment longer sentences:             The examples above confirm our results which
                                                        showed that the models do not explicitly learn
       Reference:                                       the constraint of length, but rather patterns of
    Let’s turn our attention to the hows. <eob>         segmentation. From a syntactic point of view,
    (37 characters)                                     the break symbols are inserted after a noun (e.g.
       OpenSubs-42:                                     attention, expectations) and before a preposi-
    Let’s turn our attention <eol>                      tion/conjunction (to, for, in, before), regardless of
    to the hows. <eob> (25 + 12 characters)             the model. The break symbols, even though do not
                                                        overlap with the human segmentation decisions,
                                                        are inserted at plausible positions. This leads in
       Reference:                                       subtitles that present logical completion, i.e. each
    My family’s traditions <eol>                        subtitle is formed by a phrase or syntactic unit,
    and expectations for a woman <eob>                  even though they do not respect the constraint of
    wouldn’t allow me to own a mobile <eol>             length. The conformity to the length constraint
    phone until I was married. <eob>                    seems to be forced only with the high-quality
    (22 + 28 + 39 + 20 characters)                      MuST-Cinema data. It is possible that the artificial
       OpenSubs-42:                                     break symbols in OpenSubtitles clash with the real
    My family’s traditions and expectations             break symbols in MuST-Cinema, which creates
    <eol>                                               confusion for the model. Replacing some <eob>
    for a woman wouldn’t allow me to own a mo-          with <eol> symbols in OpenSubtitles to simu-
    bile phone until I was married. <eob>               late data where human-annotated line breaks exist
    (39+72 characters)                                  means that the models trained on OpenSubtitles
                                                        observe a line break at positions where normally a
   In the following example, fine-tuning on MC in-      subtitle break is present. Given the different func-
creases length conformity, splitting the first subti-   tions of the two types of breaks, this is a possible
explanation why fine-tuning OpenSubtitles-42 on              Mattia Antonino Di Gangi, Roldano Cattoni, Luisa
MC-100 performs worse than training on MC-100                 Bentivogli, Matteo Negri, and Marco Turchi. 2019.
                                                              MuST-C: a multilingual speech translation corpus.
from scratch and provides us with insights on fu-
                                                              In Proceedings of the 2019 Conference of the North
ture design of artificial segmentation decisions to           American Chapter of the Association for Computa-
augment subtitling data.                                      tional Linguistics: Human Language Technologies,
                                                              Volume 2 (Short Papers), Minneapolis, MN, USA,
6   Conclusion                                                June.

We have presented methods to combine hetero-                 Henrik Gottlieb. 2004. Language-political implica-
                                                               tions of subtitling. Topics in Audiovisual Transla-
geneous subtitling data in order to improve au-
                                                               tion, 9:83–100.
tomatic segmentation of subtitles. We leverage
large data containing noisy segmentation deci-               Alina Karakanta, Matteo Negri, and Marco Turchi.
sions from OpenSubtitles and combine them with                 2019. Are Subtitling Corpora really Subtitle-like?
                                                               In Sixth Italian Conference on Computational Lin-
smaller amounts of high-quality data from MuST-                guistics, CLiC-It.
Cinema to generate readable subtitles from full
sentences. We found that even limited data with              Alina Karakanta, Matteo Negri, and Marco Turchi.
                                                               2020a. Is 42 the answer to everything in subtitling-
reliable segmentation can improve performance.                 oriented speech translation? In Proceedings of the
We conclude that quality matters more than size                17th International Conference on Spoken Language
for determining the break points between subtitles.            Translation, pages 209–219, Online, July. Associa-
                                                               tion for Computational Linguistics.
Acknowledgments
                                                             Alina Karakanta, Matteo Negri, and Marco Turchi.
This work is part of the “End-to-end Spoken                    2020b. Must-cinema: a speech-to-subtitles corpus.
                                                               In Proceedings of the 12th International Confer-
Language Translation in Rich Data Conditions”                  ence on Language Resources and Evaluation (LREC
project,4 which is financially supported by an                 2020), Marseille, France, May 13-15.
Amazon AWS ML Grant.
                                                             Taku Kudo and John Richardson. 2018. Sentence-
                                                               Piece: A simple and language independent subword
                                                               tokenizer and detokenizer for neural text processing.
References                                                     In Proceedings of the 2018 Conference on Empirical
Aitor Álvarez, Haritz Arzelus, and Thierry Etchegoy-          Methods in Natural Language Processing: System
  hen. 2014. Towards customized automatic segmen-              Demonstrations, pages 66–71, Brussels, Belgium,
  tation of subtitles. In Advances in Speech and Lan-          November. Association for Computational Linguis-
  guage Technologies for Iberian Languages, pages              tics.
  229–238, Cham. Springer International Publishing.
                                                             Pierre Lison and Jörg Tiedemann. 2016. Opensub-
Aitor Álvarez, Carlos-D. Martı́nez-Hinarejos, Haritz           titles2016: Extracting large parallel corpora from
  Arzelus, Marina Balenciaga, and Arantza del Pozo.             Movie and TV subtitles. In Proceedings of the In-
  2017. Improving the automatic segmentation of                 ternational Conference on Language Resources and
  subtitles through conditional random field.      In           Evaluation, LREC.
  Speech Communication, volume 88, pages 83–95.              Danni Liu, Jan Niehues, and Gerasimos Spanakis.
  Elsevier BV.                                                 2020. Adapting end-to-end speech recognition for
E. Bartoll and A. Martı́nez Tejerina. 2010. The po-            readable subtitles. In Proceedings of the 17th Inter-
   sitioning of subtitles for the deaf and hard of hear-       national Conference on Spoken Language Transla-
   ing. Listening to Subtitles. Subtitles for the Deaf and     tion, pages 247–256, Online, July. Association for
   Hard of Hearing, pages 69–86.                               Computational Linguistics.

Eduard Bartoll. 2004. Parameters for the classifica-         Evgeny Matusov, Patrick Wilken, and Yota Geor-
  tion of subtitles. Topics in Audiovisual Translation,        gakopoulou. 2019. Customizing neural machine
  9:53–60.                                                     translation for subtitling. In Proceedings of the
                                                               Fourth Conference on Machine Translation (Volume
Mauro Cettolo, Christian Girardi, and Marcello Fed-            1: Research Papers), pages 82–93, Florence, Italy,
 erico. 2012. Wit3 : Web Inventory of Transcribed              August. Association for Computational Linguistics.
 and Translated Talks. In Proceedings of the 16th
 Conference of the European Association for Ma-              Mathias Müller and Martin Volk. 2013. Statistical ma-
 chine Translation (EAMT), pages 261–268, Trento,             chine translation of subtitles: From opensubtitles to
 Italy, May.                                                  ted. In Iryna Gurevych, Chris Biemann, and Torsten
                                                              Zesch, editors, Language Processing and Knowl-
  4                                                           edge in the Web, pages 132–138, Berlin, Heidelberg.
    https://ict.fbk.eu/
units-hlt-mt-e2eslt/                                          Springer Berlin Heidelberg.
Myle Ott, Sergey Edunov, Alexei Baevski, Angela
 Fan, Sam Gross, Nathan Ng, David Grangier, and
 Michael Auli. 2019. fairseq: A fast, extensible
 toolkit for sequence modeling. In Proceedings of
 NAACL-HLT 2019: Demonstrations.
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-
  Jing Zhu. 2002. Bleu: a method for automatic
  evaluation of machine translation. In Proceedings of
  the 40th annual meeting on association for compu-
  tational linguistics, pages 311–318. Association for
  Computational Linguistics.
Elisa Perego. 2008. Subtitles and line-breaks: Towards
   improved readability. Between Text and Image: Up-
   dating research in screen translation, 78(1):211–
   223.
Dhevi J. Rajendran, Andrew T. Duchowski, Pilar
  Orero, Juan Martı́nez, and Pablo Romero-Fresco.
  2013. Effects of text chunking on subtitling: A
  quantitative and qualitative examination. Perspec-
  tives, 21(1):5–21.
Hye-Jeong Song, Hong-Ki Kim, Jong-Dae Kim, Chan-
  Young Park, and Yu-Seop Kim. 2019. Inter-
  sentence segmentation of YouTube subtitles using
  long-short term memory (LSTM). 9:1504.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
  Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz
  Kaiser, and Illia Polosukhin. 2017. Attention is all
  you need. In Advances in Neural Information Pro-
  cessing Systems, pages 6000–6010.