<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Sentence End and Punctuation Prediction in NLG Text (SEPP-NLG) Shared Task 2021</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Don Tuggener Ahmad Aghaebrahimian</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Zurich University of Applied Sciences (ZHAW) Winterthur</institution>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the first Sentence End and Punctuation Prediction in Natural Language Generation (SEPP-NLG) shared task1 held at the SwissText conference 2021. The goal of the shared task was to develop solutions for the identification of sentence boundaries and the insertion of punctuation marks into texts produced by NLG systems. The data and submissions2, and the codebase3 for the shared tasks are publicly available.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        Sentence End Detection, also known as Sentence
boundary disambiguation (SBD) or boundary
detection, is the Natural Language Processing (NLP)
task of recognizing where a sentence begins and
ends. A period is the most common end of
sentence indicator in written English as well as many
other Indo-European languages. However, a period
may be used in a decimal point, an abbreviation,
an email address, or other possible cases as well
which makes sentence boundary detection a
challenge. Other forms of punctuation such as question
and exclamation marks, semicolons, comma, etc.
add to this challenge. Although sentence
boundary detection is considered an almost solved issue
for formal written language
        <xref ref-type="bibr" rid="ref26">(Walker et al., 2001)</xref>
        ,
it poses a challenge in terms of meaning
distortion and readability in synthetic or automatically
translated or transcribed texts such as the output of
Automatic Speech Recognition (ASR) or Machine
Translation (MT) systems. The punctuation marks
in such synthetic text may be displaced for
several reasons. Detecting the end of a sentence and
placing an appropriate punctuation mark improves
the quality of such texts not only by preserving
the original meaning but also by enhancing their
readability.
      </p>
      <p>The goal of the SEPP-NLG shared task is to
build models for identifying the end of a sentence
by detecting an appropriate position for putting an
appropriate punctuation mark.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Similar to the system proposed by Grefenstette and
Tapanainen (1997), the earliest attempts for
sentence boundary detection utilize a set of rules or
regular expressions. In a different direction,
Reynar and Ratnaparkhi (1997), and Kiss and Strunk
(2006) proposed an information-centric approach
based on the Maximum Entropy model, and an
unsupervised method based on collocation
statistics respectively. Decision tree classifier
        <xref ref-type="bibr" rid="ref22">(Riley,
1989)</xref>
        , Na¨ıve Bayes
        <xref ref-type="bibr" rid="ref12">(Lo´pez and Pardo, 2015)</xref>
        and
deep learning based
        <xref ref-type="bibr" rid="ref27 ref8">(Kaur and Singh, 2019)</xref>
        models are the most recent advances based on machine
learning that are proposed for predicting correct
positions for the period in particular and other
punctuation marks in general. Moving forward and
combining the rule-based and machine
learningbased systems, Deepamala and Ramakanth (2012)
proposed a hybrid system with high performance.
      </p>
      <p>Our task is closely related to Tilk and Aluma¨e
(2016) and follow-up work that uses the Europarl
and TED talk corpora for punctuation prediction.
Similar to our goal, Z˙elasko et al. (2018);
Donabauer et al. (2021) investigate sentence
boundary detection in unpunctuated ASR outputs of
spoken dialogues based on textual features. Cho et al.
(2017) propose a method to predict sentence
boundaries and punctuation insertion in a real-time
spoken language translation tool. In a similar
setting, Klejch et al. (2017) include acoustic features
to improve punctuation prediction in a speech
translation system, and Yi and Tao (2019) combine
lexical and speech features for punctuation prediction
in a traditional ASR setting. Finally, Rehbein et al.
(2020) investigate the annotation and detection of
sentence like units in spoken language transcripts.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Task Overview</title>
      <p>
        Ultimately, the goal of SEPP-NLG is to predict
sentence ends and punctuation in NLG texts. However,
there are no corpora that feature NLG texts and
their manually transcribed and corrected versions.
Therefore, we approximate the setting by using a)
transcripts of spoken texts, and b) lower-casing the
texts and removing all punctuation marks. While
there are multiple corpora of transcribed spoken
language, we choose the Europarl corpus4
        <xref ref-type="bibr" rid="ref11">(Koehn,
2005)</xref>
        as the source for our data. The Europarl
corpus consists of transcripts of the sessions of
the European parliament and features transcripts in
multiple languages.
      </p>
      <p>We offer the following subtasks:
• Subtask 1: (fully unpunctuated
sentencesfull stop detection): Given the textual content
of an utterance where all punctuation marks
are removed, correctly detect the end of
sentences by placing a full stop in appropriate
positions.
• Subtask 2: (fully unpunctuated
sentencesfull punctuation marks): Given the textual
content of an utterance where all punctuation
marks are removed, correctly predict all
punctuation marks.</p>
      <p>Participants were free to choose for which
languages and subtasks they contributed a submission,
but were encouraged to participate in all languages.
3.1</p>
      <sec id="sec-3-1">
        <title>Data</title>
        <p>
          We leverage the open parallel corpus (OPUS)
version of the Europarl corpus5 (Tiedema
          <xref ref-type="bibr" rid="ref2">nn, 2012</xref>
          )
for extracting the task data as it provides sentence
boundaries and tokenization. Albeit the sentence
        </p>
        <sec id="sec-3-1-1">
          <title>4http://www.statmt.org/europarl/</title>
          <p>5https://opus.nlpl.eu/Europarl.php
boundaries in the corpus are automatically
generated, they are quite reliable as the data and the
models trained to detect the boundaries contain all
the original punctuation symbols of the transcripts.</p>
          <p>In the spirit of the “Swissness” of the SwissText
conference where SEPP-NLG 2021 is co-located,
we select 3 of the 4 official languages6 of
Switzerland, i.e. German, French, and Italian and
complement the selection by incorporating English.7</p>
          <p>The Europarl corpus contains multiple
punctuation symbols. For subtask 2, we gauged which
subset of them represents a realistic and feasible goal
for their automatic prediction in a stream of
unpunctuated, lower-cased tokens. Also, we considered
which punctuation marks improve the readability
of a text the most. Hence, we consolidated the
selection of punctuation symbols for subtask 2 to
: ; ?:0 (0 indicating no punctuation) and mapped
the symbols !; to :, the period. We removed all
sentences from the data that contain other
punctuation symbols such as parentheses, as there is no
straightforward way to remove punctuation without
interfering with the naturalness of a sentence. This
removal affected the data for both subtasks and
resulted in removing less than 10% of the data per
language. We also removed HTML artifacts, and
special (non-visible) characters (zero width space,
soft hyphen) from the data. Finally, we omitted
sentences with fewer than 3 tokens and documents
with fewer than 2 sentences.</p>
          <p>The data format is as follows: Lower-cased
tokens per file are listed vertically, and the labels for
subtask 1 (binary classification) and 2 (multiclass
classification) are appended horizontally, separated
by tab. The labels encode whether a token emits a
sentence end (subtask 1) and a punctuation symbol
(subtask 2). Table 1 shows an example.</p>
          <p>Per language, we randomly selected 80% of the
documents for the training and 20% for the test
set. From the the training set, we then randomly
sampled 20% of the documents as the development
set.</p>
          <p>Table 2 shows several statistics of our data. We
see similar properties for all languages: Most
sentences are unique, and there are few sentences that
occur both in the train and test sets.8 German
fea6The forth, Romansh, is not represented in Europarl.
7Incorporating further languages from the OPUS corpus
using our scripts is seamless as the data format is consistent
across languages.</p>
          <p>8Duplicate sentences are often formulaic, administrative
ones, like ”The session is adjourned.” etc.
tures the largest vocabulary, as is expected due
to its morphological richness, and the vocabulary
overlap between train and test sets is roughly 50%
for all languages.</p>
          <p>Concerning the labels, the data is highly skewed
towards the 0 label for both tasks, as most tokens do
not emit a sentence end or punctuation symbol after
them. For example, there are 9’618’776 tokens
with the label 0 and 420’446 with label 1 subtask
one in the English test set, which yields an average
sentence length of almost 24 tokens. Table 3 shows
a breakdown of the label counts in the English
test set for subtask 2. It shows that the period
and comma symbols have similar counts and are
the most frequent labels among the non-0 labels.
The remaining labels occur less than an order of
magnitude less frequently. These label distribution
properties are similar across all languages.
3.2</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Surprise Test Data</title>
        <p>
          The Europarl corpus covers domain-specific
language, i.e. political statements in the European
parliament. To measure how well the participating
systems trained on our data generalize to
out-ofdomain data, we incorporated a surprise test set
comprised of TED talk transcripts9
          <xref ref-type="bibr" rid="ref17 ref19 ref20">(Reimers and
Gurevych, 2020)</xref>
          .
        </p>
        <p>For each language, we sampled 500 TED talks,
favoring those that have the lowest vocabulary
overlap with our Europarl test sets to maximize the
vocabulary shift. The document-based average
percentage of the vocabulary overlap ranges from 85</p>
        <sec id="sec-3-2-1">
          <title>9https://opus.nlpl.eu/TED2020.php</title>
          <p>to 90, meaning there are on average 10-15% of
tokens per document in the surprise test set that are
not in the Europarl test set.</p>
          <p>While being one order of magnitude smaller than
the Europarl test set, the surprise test set is also
highly and similarly imbalanced regarding the
label distribution. In the English surprise test set,
there are 67’446 tokens with label 1 and 1’014’464
tokens with label 0. This yields an average
sentence length of 16 tokens, which is significantly
lower than the 24 tokens in the English Europarl
test set. The label counts for subtask 2 follow an
almost identical distribution in both test sets.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Submissions</title>
      <p>
        ZHAW-mbert: We provided a baseline based on
the multilingual BERT model
        <xref ref-type="bibr" rid="ref3">(Devlin et al., 2019)</xref>
        ,
mBERT, implemented in the simpletransformers
library10. We treat the task as a token classification
problem and segment the documents into
subsequent, non-overlapping chunks of length 512 to
adhere to the sequence length restrictions of BERT.
We fine-tuned the model on the training data of
all languages with a randomly shuffled file order
across all languages and vanilla settings for about
one week on a single GPU.
      </p>
      <p>
        ZHAW-adapter-mbert: To contrast the
resource-intensive fine-tuning of mBERT with a
computationally cheaper approach of task adaption,
we apply the adapter-transformers library11
        <xref ref-type="bibr" rid="ref16">(Pfeiffer et al., 2020)</xref>
        . Instead of updating all the
weights of the base models (mBERT in our case),
the adapters approach inserts a few feed-forward
layer in between the transformer blocks and only
trains those for adapting a base model to a new
task. We again use the vanilla settings and train the
model for one day.
      </p>
      <p>OnPoint: In their study of sentence
segmentation, Michail et al. (2021) proposed a
majorityvoting ensemble model consisting of several
Transformer models trained in different ways. The
models’ predictions are leveraged at test time using a
sliding window to obtain the final predictions. They
offered their system as language-dependent models
for all four languages of the shared task and both
sub-tasks.</p>
      <p>10https://github.com/ThilinaRajapakse/
simpletransformers</p>
      <p>11https://github.com/Adapter-Hub/
adapter-transformers/
train\test #tokens</p>
      <p>Unbabel-INESC-ID: Rei et al. (2021) extend
the architecture proposed by Rei et al. (2020) to
develop a multilingual model for sentence end and
punctuation prediction. Their system is designed
based on pre-trained contextual embeddings and
built on top of a pre-trained Transformer-based
encoder model. They propose their method as a
single multilingual model for all languages and
subtasks of the shared task.</p>
      <p>UR-mSBD: Donabauer and Kruschwitz (2021)
propose a system based on a pre-trained BERT
model and fine-tuned for the first sub-task. They
use language-specific models for each of the four
languages of the shard task. They consider sub-task
1 as a binary classification problem by identifying
tokens that indicate the position of a full stop.</p>
      <p>oneNLP: Applying multi-task Albert for
English and multi-lingual Bert for other
languages Mujadia et al. (2021) explored the impact
of using contextual language models for sentence
end and punctuation prediction. They modeled the
problem in both subtasks as a sequence labeling
task. They presented the results of employing a
baseline CRF, as well as the results of applying a
fine-tuning method over contextual embedding.</p>
      <p>
        HULAT UC3M: Based on the Punctuator
framework
        <xref ref-type="bibr" rid="ref24">(Tilk and Aluma¨e, 2016)</xref>
        which is a
bidirectional recurrent neural network model equipped
with an attention mechanism, Masiello-Ruiz et al.
(2021) developed an automatic punctuation
system named HULAT-UC3M. They trained
HULATUC3M for all languages as well as both sub-tasks
in the shared task individually.
      </p>
      <p>HTW: Guhr et al. (2021) modeled the task as
a token-wise prediction and examined several
language models based on the transformer architecture.
They trained two separate models for the two tasks
and submitted their results for all four languages of
the shared task. They advocated transfer learning
for solving the task and showed that the
multilingual transformer models yielded better results than
monolingual models. By pruning the Bert layers,
they also showed that their model retains 99% of
its performance without 1/4 of the last layers.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>In section 3.1 we showed that our data is highly
imbalanced regarding the label distribution. Accuracy
or Macro F1 scores are not suitable metrics in this
setting, as majority class prediction would yield
an accuracy of 96% for subtask 1 on the English
test set, e.g. Therefore, we applied the following
metrics to evaluate the participants’ submissions:
• Subtask 1: F1 score of the label 1 (the
positive class, i.e. sentence end)
• Subtask 2: Macro F1 of the selected
punctuation symbols</p>
      <p>We observe that a) most systems achieve a very
high score for subtask 1 for all languages on the
Europarl data, and b) the F1 scores are almost
identical (with seemingly minor differences in
precision and recall) for the top-ranking systems for
both tasks. Further, the top-ranking systems are the
same ones for both tasks. This is to be expected
to some degree, as it can be argued that subtask 2
subsumes subtask 1.</p>
      <p>While the F1 scores for subtask 2 seem low
compared to subtask 1, a more detailed results analysis
reveals that the lower (Macro) F1 scores mainly
stem from the labels with the lowest counts in the
data. Table 6 gives the detailed classification report
for the top three ranking system for the English test
to the Europarl dataset, we train the ZHAW-mbert
set. It shows that the systems are able to predict
approach on the remaining TED talks that were not
periods, commas, and question marks reliably, but
selected for the surprise test set and then test the
that they struggle with hyphens and colons, which
system on the surprise test set. Table 7 shows that
lowers the Macro F1 scores.</p>
      <p>Label
htw+t2k</p>
      <p>OnPoint</p>
      <p>Unbabel
0
,
.
:
?
0.99
0.82
0.95
0.42
0.57
0.88
0.99
0.82
0.95
0.41
0.57
0.91
tems on the English test set for subtask 2.</p>
      <p>All systems perform significantly worse on the
surprise test sets for both tasks. To gauge the
difficulty of the task on the TED
dataset compared
the average F1 score does improve by 11
percentage points when training the ZHAW-mbert system
on domain data. Still, the 0.66 F1 score is 9
percentage points behind the average F1 score on the
Europarl data. Hence, the drop in performance of
Europarl-trained ZHAW-mbert on the surprise test
set can both be accounted for by the domain shift
and by the increased difficulty of the target domain
(TED talks).</p>
      <p>We expect that this applies for the
performance drop of all systems.</p>
      <p>ZHAW-mbert</p>
      <p>Prec.
for subtask 2 (averaged over all languages).</p>
      <p>We expected some submissions to use linguistic
features such as part-of-speech tags or partial
syntax parse trees and hypothesized that such systems
would fare better on out-of-domain data. However,
all participating systems applied neural encodings
of the surface tokens and did not encode linguistic
features explicitly. Still, the ranking of the systems
remains intact on the surprise test sets.</p>
      <p>The top three systems in both tasks all use
transformers-based approaches and tackle the tasks
in a similar manner. We hypothesize that this is
the main reason for near identical performance of
the systems in terms of F1 scores. Based on the
task results, these three systems seem to produce
near-identical output. To better gauge their
similarities and differences, we evaluate their outputs for
subtask 2 in a pair-wise manner on the English test
set. We apply the evaluation metric such that one
system output takes the role of the ground truth and
the other the one of the system prediction, which
yields the F1 scores per class that we leverage as
an indicator of the similarity or agreement of the
per-token predictions. Table 8 shows the results.
While the macro F1 scores and even the per-class
F1 scores in Table 6 are highly similar, there are
significant differences in this analysis. For
example, for the hyphen class, the systems have different
predictions in over 30% of the cases, and for colon
in roughly 20%. For the majority classes of the
non-0 classes, the systems disagree in about 10%
of the cases for comma, but their predictions are
highly similar for period (96% agreement).
htw+t2k vs
Unbabel</p>
      <p>OnPoint vs
Unbabel</p>
      <p>OnPoint vs
htw+t2k</p>
      <p>Following Tuggener (2017), we can take the
comparison a step further and analyse the type of
differences per label. For example, the OnPoint
submission’s F1 score for hyphen is 4 percentage
points higher than the one of Unbabel, and their
prediction agreement for hyphen is 68%. This does not
indicate, however, whether OnPoint’s predictions
are always better. The aforementioned comparison
takes a ground truth label G, the predicted label A
of one system, and the predicted label B of another
system and defines three types of differences for
the cases where A 6= B:
• correction: G = B
• new error: G = A
• changed error: G 6= A 6= B</p>
      <p>Table 9 shows the results. We see that the
predictions of commas makes up a large portion of
the differences. When OnPoint’s prediction
differs from Unbabel’s for comma, OnPoint is correct
and Unbabel incorrect in nearly 70% of the cases,
which explains the 2 percentage point higher
performance of OnPoint in Table 6. Still, Unbabel is
correct in almost 30% of the cases where the two
predictions differ.
While we showed that there are differences in the
outputs of the top three systems that are not
reflected in the averaged F1 scores, the declared
criteria for winning the task are the averaged F1 scores
in Tables 4 and 5. Since the top three systems in
these tables are practically indistinguishable based
on these F+ scores, we declare OnPoint, htw+t2k,
and Unbabel as the joint winners of the SEPP-NLG
2021 shared task. Congratulations!</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>We presented the setting and results of the first
Sentence End and Punctuation Prediction in NLG text
(SEPP-NLG 2021) shared task. We found that all
participants explored neural networks-based
models (particularly transformers) to tackle the task.
The results for the in-domain Europarl data were
high for the most common punctuation symbols,
but the performance decreased significantly when
the models were faced with out-of-domain data.</p>
      <p>The discussion of the task results during the
session at the SwissText conference yielded the
following desiderata for future iterations of the shared
task:
• More heterogeneous data (more domains)
• Add truecasing as an additional task
• Add other language families
• Take inference time / computational costs as
an additional evaluation criteria, or create a
separate track that puts emphasis on a
lowresource/low-latency setting</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We thank the participants for their submissions and
their valuable feedback on early versions of the
data and task details. This work was funded by
Innosuisse under grant project nr. 43446.1 IP-ICT.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Eunah</given-names>
            <surname>Cho</surname>
          </string-name>
          , Jan Niehues, and
          <string-name>
            <given-names>Alex</given-names>
            <surname>Waibel</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Nmtbased segmentation and punctuation insertion for real-time spoken language translation</article-title>
          .
          <source>In Interspeech</source>
          , pages
          <fpage>2645</fpage>
          -
          <lpage>2649</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Nn. Deepamala</surname>
            and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Ramakanth</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Sentence boundary detection in kannada language</article-title>
          .
          <source>International Journal of Computer Applications</source>
          ,
          <volume>39</volume>
          :
          <fpage>38</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers), pages
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Gregor</given-names>
            <surname>Donabauer</surname>
          </string-name>
          and
          <string-name>
            <given-names>Udo</given-names>
            <surname>Kruschwitz</surname>
          </string-name>
          .
          <year>2021</year>
          . University of regensburg @ swisstext
          <year>2021</year>
          sepp
          <article-title>-nlg: Adding sentence structure to unpunctuated text</article-title>
          .
          <source>In Proceedings of the 1st Shared Task on Sentence End and Punctuation Prediction in NLG Text (SEPPNLG 2021) at SwissText</source>
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Gregor</given-names>
            <surname>Donabauer</surname>
          </string-name>
          , Udo Kruschwitz, and
          <string-name>
            <given-names>David</given-names>
            <surname>Corney</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>Making sense of subtitles: Sentence boundary detection and speaker change detection in unpunctuated texts</article-title>
          .
          <source>In Companion Proceedings of the Web Conference</source>
          <year>2021</year>
          , pages
          <fpage>357</fpage>
          -
          <lpage>362</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Gregory</given-names>
            <surname>Grefenstette</surname>
          </string-name>
          and
          <string-name>
            <given-names>Pasi</given-names>
            <surname>Tapanainen</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>What is a word, what is a sentence? problems of tokenization.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Oliver</given-names>
            <surname>Guhr</surname>
          </string-name>
          ,
          <string-name>
            <surname>Anne-Kathrin</surname>
            <given-names>Schumann</given-names>
          </string-name>
          , Frank Bahrmann, and
          <string-name>
            <surname>Hans-Joachim Bohme</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>Fullstop: Multilingual deep models for punctuation prediction</article-title>
          .
          <source>In Proceedings of the 1st Shared Task on Sentence End and Punctuation Prediction in NLG Text (SEPP-NLG 2021) at SwissText</source>
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Jagroop</given-names>
            <surname>Kaur</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jaswinder</given-names>
            <surname>Singh</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Deep neural network based sentence boundary detection and end marker suggestion for social media text</article-title>
          .
          <source>In 2019 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS)</source>
          , pages
          <fpage>292</fpage>
          -
          <lpage>295</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Tibor</given-names>
            <surname>Kiss</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jan</given-names>
            <surname>Strunk</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Unsupervised multilingual sentence boundary detection</article-title>
          .
          <source>Comput. Linguist.</source>
          ,
          <volume>32</volume>
          (
          <issue>4</issue>
          ):
          <fpage>485</fpage>
          -
          <lpage>525</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Ondrˇej Klejch</surname>
          </string-name>
          ,
          <string-name>
            <surname>Peter Bell</surname>
            , and
            <given-names>Steve</given-names>
          </string-name>
          <string-name>
            <surname>Renals</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features</article-title>
          .
          <source>In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source>
          , pages
          <fpage>5700</fpage>
          -
          <lpage>5704</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Philipp</given-names>
            <surname>Koehn</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Europarl: A parallel corpus for statistical machine translation</article-title>
          .
          <source>Machine Translation Summit</source>
          ,
          <year>2005</year>
          , pages
          <fpage>79</fpage>
          -
          <lpage>86</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Roque</given-names>
            <surname>Lo</surname>
          </string-name>
          <article-title>´pez and Thiago A</article-title>
          . S. Pardo.
          <year>2015</year>
          .
          <article-title>Experiments on sentence boundary detection in usergenerated web content</article-title>
          .
          <source>In Computational Linguistics and Intelligent Text Processing</source>
          , pages
          <fpage>227</fpage>
          -
          <lpage>237</lpage>
          , Cham. Springer International Publishing.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Jose</given-names>
            <surname>Manuel</surname>
          </string-name>
          Masiello-Ruiz, Jose Luis Lopez Cuadrado, and
          <string-name>
            <given-names>Paloma</given-names>
            <surname>Martinez</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>Participation of hulat-uc3m in sepp-nlg 2021 shared task</article-title>
          .
          <source>In Proceedings of the 1st Shared Task on Sentence End and Punctuation Prediction in NLG Text (SEPP-NLG 2021) at SwissText</source>
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Andrianos</given-names>
            <surname>Michail</surname>
          </string-name>
          , Silvan Wehrli, and Tere´zia Buckova´.
          <year>2021</year>
          . Uzh onpoint at swisstext-2021:
          <article-title>Sentence end and punctuation prediction in nlg text through ensembling of different transformers</article-title>
          .
          <source>In Proceedings of the 1st Shared Task on Sentence End and Punctuation Prediction in NLG Text (SEPPNLG 2021) at SwissText</source>
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Vandan</given-names>
            <surname>Mujadia</surname>
          </string-name>
          , Pruthwik Mishra Dipti, and
          <string-name>
            <given-names>Misra</given-names>
            <surname>Sharma</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>Deep contextual punctuator for nlg text</article-title>
          .
          <source>In Proceedings of the 1st Shared Task on Sentence End and Punctuation Prediction in NLG Text (SEPP-NLG 2021) at SwissText</source>
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Jonas</given-names>
            <surname>Pfeiffer</surname>
          </string-name>
          , Andreas Ru¨ckle´,
          <string-name>
            <surname>Clifton</surname>
            <given-names>Poth</given-names>
          </string-name>
          , Aishwarya Kamath, Ivan Vulic´,
          <string-name>
            <surname>Sebastian</surname>
            <given-names>Ruder</given-names>
          </string-name>
          , Kyunghyun Cho, and
          <string-name>
            <given-names>Iryna</given-names>
            <surname>Gurevych</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Adapterhub: A framework for adapting transformers</article-title>
          .
          <source>In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</source>
          , pages
          <fpage>46</fpage>
          -
          <lpage>54</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Ines</given-names>
            <surname>Rehbein</surname>
          </string-name>
          , Josef Ruppenhofer, and
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Schmidt</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Improving sentence boundary detection for spoken language transcripts</article-title>
          .
          <source>In Proceedings of the 12th International Conference on Language Resources and Evaluation (LREC)</source>
          ,
          <source>May 11- 16</source>
          ,
          <year>2020</year>
          , Palais du Pharo, Marseille, France, pages
          <fpage>7102</fpage>
          -
          <lpage>7111</lpage>
          . European Language Resources Association.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Ricardo</given-names>
            <surname>Rei</surname>
          </string-name>
          , Fernando Batista, ,
          <string-name>
            <surname>Nuno</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Guerreiro</surname>
            , and
            <given-names>Luisa</given-names>
          </string-name>
          <string-name>
            <surname>Coheur</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>Multilingual simultaneous sentence end and punctuation prediction</article-title>
          .
          <source>In Proceedings of the 1st Shared Task on Sentence End and Punctuation Prediction in NLG Text (SEPP-NLG 2021) at SwissText</source>
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Ricardo</given-names>
            <surname>Rei</surname>
          </string-name>
          , Nuno Miguel Guerreiro, and
          <string-name>
            <given-names>Fernando</given-names>
            <surname>Batista</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Automatic truecasing of video subtitles using bert: A multilingual adaptable approach</article-title>
          .
          <source>In Information Processing and Management of Uncertainty in Knowledge-Based Systems</source>
          , pages
          <fpage>708</fpage>
          -
          <lpage>721</lpage>
          , Cham. Springer International Publishing.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Nils</given-names>
            <surname>Reimers</surname>
          </string-name>
          and
          <string-name>
            <given-names>Iryna</given-names>
            <surname>Gurevych</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Making monolingual sentence embeddings multilingual using knowledge distillation</article-title>
          .
          <source>In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          , pages
          <fpage>4512</fpage>
          -
          <lpage>4525</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Jeffrey C.</given-names>
            <surname>Reynar</surname>
          </string-name>
          and
          <string-name>
            <given-names>Adwait</given-names>
            <surname>Ratnaparkhi</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>A maximum entropy approach to identifying sentence boundaries</article-title>
          .
          <source>ANLC '97, page 16-19</source>
          , USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Michael D.</given-names>
            <surname>Riley</surname>
          </string-name>
          .
          <year>1989</year>
          .
          <article-title>Some applications of treebased modelling to speech and language</article-title>
          .
          <source>HLT '89, page 339-352</source>
          , USA. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <source>Jo¨rg Tiedemann</source>
          .
          <year>2012</year>
          .
          <article-title>Parallel data, tools and interfaces in opus</article-title>
          .
          <source>In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)</source>
          , Istanbul, Turkey.
          <source>European Language Resources Association (ELRA).</source>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>Ottokar</given-names>
            <surname>Tilk</surname>
          </string-name>
          and Tanel Aluma¨e.
          <year>2016</year>
          .
          <article-title>Bidirectional recurrent neural network with attention mechanism for punctuation restoration</article-title>
          .
          <source>In INTERSPEECH.</source>
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>Don</given-names>
            <surname>Tuggener</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>A method for in-depth comparative evaluation: How (dis)similar are outputs of pos taggers, dependency parsers and coreference resolvers really</article-title>
          ?
          <source>In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume</source>
          <volume>1</volume>
          ,
          <string-name>
            <surname>Long</surname>
            <given-names>Papers</given-names>
          </string-name>
          , pages
          <fpage>188</fpage>
          -
          <lpage>198</lpage>
          , Valencia, Spain. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>Daniel J. Walker</surname>
            ,
            <given-names>David E.</given-names>
          </string-name>
          <string-name>
            <surname>Clements</surname>
          </string-name>
          , Maki Darwin, and
          <string-name>
            <surname>Jan</surname>
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Amtrup</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Sentence boundary detection: A comparison of paradigms for improving mt quality</article-title>
          .
          <source>In In Proceedings of MT Summit VIII: Santiago de Compostela</source>
          , pages
          <fpage>18</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <given-names>Jiangyan</given-names>
            <surname>Yi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jianhua</given-names>
            <surname>Tao</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Self-attention based model for punctuation prediction using word and speech embeddings</article-title>
          .
          <source>In ICASSP 2019- 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</source>
          , pages
          <fpage>7270</fpage>
          -
          <lpage>7274</lpage>
          . IEEE.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <surname>Piotr</surname>
            <given-names>Z</given-names>
          </string-name>
          ˙ elasko, Piotr Szyman´ski, Jan Mizgajski, Adrian Szymczak, Yishay Carmiel, and
          <string-name>
            <given-names>Najim</given-names>
            <surname>Dehak</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Punctuation prediction model for conversational speech</article-title>
          .
          <source>Proc. Interspeech</source>
          <year>2018</year>
          , pages
          <fpage>2633</fpage>
          -
          <lpage>2637</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>