=Paper=
{{Paper
|id=Vol-2957/sepp_paper3
|storemode=property
|title=Multilingual Simultaneous Sentence End and Punctuation Prediction (short paper)
|pdfUrl=https://ceur-ws.org/Vol-2957/sepp_paper3.pdf
|volume=Vol-2957
|authors=Ricardo Rei,Fernando Batista,Nuno M. Guerreiro,Luisa Coheur
|dblpUrl=https://dblp.org/rec/conf/swisstext/ReiBGC21
}}
==Multilingual Simultaneous Sentence End and Punctuation Prediction (short paper)==
Multilingual Simultaneous Sentence End and Punctuation Prediction Ricardo Rei Fernando Batista Unbabel INESC-ID INESC-ID ISCTE - Instituto Universitário de Lisboa Instituto Superior Técnico fernando.batista@inesc-id.pt ricardo.rei@unbabel.com Nuno M. Guerreiro Luisa Coheur Instituto de Telecomunicações INESC-ID Instituto Superior Técnico Instituto Superior Técnico nuno.s.guerreiro@tecnico.pt luisa.coheur@inesc-id.pt Abstract summarization (Zechner, 2002; Huang and Zweig, 2002; Kim and Woodland, 2003; Ostendorf et al., This paper describes the model and its corre- 2005; Jones et al., 2005; Makhoul et al., 2005; sponding setup, proposed by the Unbabel & INESC-ID team for the 1st Shared Task on Shriberg, 2005; Matusov et al., 2006; Peitz et al., Sentence End and Punctuation Prediction in 2011; Cattoni et al., 2007; Ostendorf et al., 2008; NLG Text (SEPP-NLG 2021). The shared task Liao et al., 2020). covers 4 languages (English, German, French Most of the available studies focus on full stop and Italian) and includes two subtasks: sub- and comma, which have higher corpus frequencies, task 1 – detecting the end of a sentence, and and a number of more restricted studies also con- subtask 2 – predicting a range of punctuation sider the question mark. However, several punctua- marks. Our team proposes a single multilin- gual and multitask model that is able to pro- tion marks can be considered for automatically gen- duce suitable results for all the languages and erated texts, including: comma; period or full stop; subtasks involved. The results show that it exclamation mark; question mark; colon; semi- is possible to achieve state-of-the-art results colon; and quotation marks. Nevertheless, most of using one single multilingual model for both these marks rarely occur and are quite difficult to tasks and multiple languages. Using a sin- insert or evaluate. Quotations and semicolons, for gle multilingual model to solve the task for example, are often used inconsequently and in a multiple languages is of particular importance, highly variable way. since training a different model for each lan- guage is a cumbersome and time-consuming This paper proposes a multilingual model that process. Finally, the code for the shared is able to detect sentence boundaries and predict task is publicly available for reproducible pur- a wide range of punctuation marks, based on pre- poses at https://github.com/Unbabel/ trained contextual embeddings. Our architecture caption/tree/shared-task. is composed of three main building blocks: a pre- trained Transformer-based encoder model, an at- 1 Introduction tention mechanism over the encoder layers, and The text produced by a speech recognition system the task classification heads. The proposed model or by an automatic machine translation system of- derives from the multilingual model proposed by ten includes misplaced punctuation and, in the case (Guerreiro et al., 2021), which achieves fairly com- of a speech recognition system, the output often petitive results in a multi-language scenario, even consists of raw single-case words, without punc- surpassing the existing results for some of the lan- tuation marks, and may not even include sentence guages. boundaries. Detecting the sentence boundaries and The reminder of the paper is organized as fol- the missing punctuation in such automatically gen- lows: Section 2 presents an overview of the related erated texts improves the quality of such texts, and work. Section 3 overviews the data used for train- is often relevant for a number of downstream tasks, ing fine-tuning our model. Section 4 presents the such as parsing, information extraction, dialog act building blocks of the model architecture, and the modeling, Named Entity Recognition (NER), and setup parameters. Section 5 reports the experiments Copyright ©2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 Interna- tional (CC BY 4.0). performed and Section 6 presents the correspond- 2021) that showed that having one single multilin- ing results. Finally, Section 7 presents the most gual model is competitive with having one model relevant conclusions, and mentions possible future trained for each language. directions. 3 Corpora 2 Related work 100000000 Proper identification of sentence boundaries and punctuation recovery are two profoundly connected 10000000 tasks that can result in great improvements for speech processing downstream task (Harper et al., 1000000 2005; Mrozinsk et al., 2006; Ostendorf et al., 2008). For that reason, recovering structural information 100000 from text produced by Automatic Speech Recog- nition (ASR) becomes an objective of many stud- 10000 words comma full-stop dash colon qmark ies. Early studies used a combination of n-grams EN DE FR IT with prosodic classifiers through the general Hid- den Markov Models framework (Beeferman et al., Figure 1: Frequency of each punctuation mark 1998; Christensen et al., 2001; Kim and Woodland, 2001). With the development of Conditional Ran- dom Fields (CRF) and Maximum Entropy models, IT 53.9% 40.8% researchers were able to successfully improve these FR 54.6% 40.3% task (Huang and Zweig, 2002; Liu et al., 2005, DE 60.1% 35.2% 2006; Batista et al., 2007, 2008, 2009; Lu and EN 52.9% 42.0% Ng, 2010; Batista et al., 2010, 2012; Ueffing et al., 0% 20% 40% 60% 80% 100% 2013). comma full-stop dash colon qmark Regarding machine translation, it is a well- known fact that punctuation and capitalization er- Figure 2: Frequency of each punctuation mark rors are a predominant problem for Statistical Ma- chine Translation (SMT). Several studies tried to The SEPP-NLG challenge adopted the Europarl enrich the SMT output by inserting proper capital- corpus, covering English, German, French, and Ital- ization and punctuation in the returned translation ian. The corpus was previously processed in order (Cattoni et al., 2007; Peitz et al., 2011). Even with to remove punctuation marks and case information, Neural Machine Translation (NMT), the punctua- as a way to simulate Natural Language Generated tion errors are still the most predominant type of text. The challenge considers 5 different punctua- errors. Indeed, these represent around 20% of the tion marks: comma (,), full-stop (.), dash (-), colon errors produced by the high performing systems (:) and question marks (?). Figures 1 and 2 show from WMT20 News Translation shared task (Fre- the frequency of the words and punctuation marks itag et al., 2021). for each one of the languages, considering the train- Most of the recent approaches for punctuation ing and development sets. As expected, from all restoration are based on neural networks such as the punctuation marks being considered, comma is Recurrent Neural Networks (RNN) and Transform- the most frequent, occurring between 52.9% (EN) ers. With that said, most works treat the problem and 60.1% (DE) of the times, followed by full-stop, either as a sequence-to-sequence or as a sequence occurring between 42% (EN) and 35.2% (DE) of labelling task (Tilk and Alumäe, 2015, 2016; Che the times. All the other punctuation marks into et al., 2016; Klejch et al., 2017; Yi and Tao, 2019; consideration, occur less than 0.24% of the times Kim, 2019). Following the recent trends in Natural for all the considered languages. About 95% of Language Processing (NLP) some of these works the sentences contain between 3 and 50 words, but take advantage of pre-trained models such as BERT the maximum sentence length is 303 words for EN, Cai and Wang (2019); Makhija et al. (2019); Guer- 450 for DE and IT, and 423 for FR. 99% of the sen- reiro et al. (2021). Our shared task participation tences contain 1 to 7 punctuation marks, including is mostly based on the work by (Guerreiro et al., the corresponding sentence boundary. However, Figure 3: Model architecture used to compete on the SEPP-NLG 2021 shared task. This model follows the architecture proposed by Guerreiro et al. (2021), but with a classification head that simultaneously predicts sentence ends (binary classification) and punctuation marks (multinomial classification). some of the sentences, mostly consisting of lists of a single embedding, exi , the following layer-wise j numbers, may contain up to about 200 commas. attention mechanism is used: 4 System Description exi = γEx�i Λ (1) j j As it was previously mentioned, our system archi- where γ is a trainable scaling factor, Exi = j (0) (1) (24) tecture extends the architecture proposed by (Guer- [exi , exi , . . . exi ] corresponds to the vector of reiro et al., 2021) which has shown promising, re- j j j layer embeddings for sub-word xij , and Λ = sults in multilingual punctuation prediction and softmax([λ(1) , λ(2) , . . . , λ(24) ]) is a vector consti- capitalization (Rei et al., 2020). This architecture tuted by the layer scalar trainable parameters which is composed of 3 modules: an Encoder Model, a are shared for every sub-word. Finally, we concate- Layer-wise Attention Mechanism, and a Classi- nate the embeddings of consecutive words1 in the fication Head. In our experiments to the shared input sequence xi and use those as features for our task we replaced the XLM-R base with XLM-R punctuation (ML – multi-label) and full-stop (B – large (Conneau et al., 2020) and also added a new binary) classification heads. Figure 3 illustrates the binary classification head for subtask 1 (full-stop described architecture. prediction). With that said, when our system receives a doc- 5 Experiments ument, that document is tokenized using XLM- R tokenizer and� divided into several input se- We started our experiments with the exact same � hyper-parameters used by Guerreiro et al. (2021). quences xi = xi0 , xi1 , . . . , xi511 with 512 sub- words. Then for each input sequence, the encoder To achieve better performance we also ran an hyper- (�) will produce an embedding exi for each sub-word parameter search using O PTUNA (Akiba et al., j 1 When a word is divided into several sub-words we use xij and each layer � ∈ {0, 1, ..., 24}. To encapsu- the embedding of the first sub-word to represent the late information from all transformer layers into entire word. Figure 4: Best trial hyper-parameters highlighted in the O PTUNA search space. 2019). In this section we will describe the train- memory GPU. ing setup and the evaluation metrics used for these Evaluation is performed after each epoch using experiments. only 50% of the entire development data. The train- ing is interrupted after 2 epochs without improve- 5.1 Evaluation Setup ments on the punctuation task Macro-F1. The official shared task metric for full-stop predic- tion is the F1 score of the positive class (sentence 5.3 Hyper-parameter Search end). For the punctuation prediction sub-task the We used O PTUNA (Akiba et al., 2019) to search for official metric is Macro-F1. Since our developed the optimal hyper-parameters for our model. Our model performs both tasks at the same time, we search space was defined as follows: also combine those two metrics by multiplying them. Following Guerreiro et al. (2021), we addi- • Accumulate gradients for 1 to 32 batches (this tionally measure the punctuation Slot Error Rate simulates bigger batches while avoiding mem- (SER) (Makhoul et al., 2005), a commonly used ory issues); metric for the task at hand. Also, we discard the “O” (no punctuation) label for calculation of our • Classification heads dropout between 0.1 and Macro-F1 scores. 0.5 with sampling from a uniform distribution; 5.2 Training Setup • Layer-wise learning rate decay between 0.75 and 1.0 with sampling from a uniform distri- Our model uses a discriminative fine-tuning strat- bution; egy with gradual unfreezing by splitting the model parameters into two groups: the XLM-R param- • Encoder model learning rate between 1e-05 eters and the classification heads on top. The en- and 1e-04 with sampling from a log-uniform coder parameters are frozen for 0.1% steps of the distribution; first epoch. This allows the parameters of the classi- fication heads to adjust to the task objective before • Classification heads learning rate between changing the pre-trained ones. Then, the entire 1e-05 and 3e-04 with sampling from a log- model parameters are fine-tuned, except the em- uniform distribution; bedding layer that is kept frozen. Keeping the em- bedding layer frozen allows us to save some GPU • Full-stop prediction loss with two possible memory and fit the entire model into a single 12GB values: 1 and 2; Predicted Labels 6 Results Question Comma Full-stop Dash Colon mark Comma 1443471 36286 8252 3667 1142 Table 2 shows that, as expected, using a larger encoder improves our results. Also, by using O P - True Labels Full-stop 39929 1164106 622 4255 2442 TUNA , we were able to further improve our results Dash 32925 4453 18632 1154 99 which means that the models presented by Guer- Colon 5106 16404 280 24149 101 reiro et al. (2021) are under-tuned and could be Q. mark 1493 3234 44 50 34635 further improved with a better selection of hyper- Figure 5: Confusion Matrix for punctuation prediction. parameters. Looking into the results for individual punctua- tion marks we can observe that our final submission has a high F1 for commas, full stops and question • Punctuation prediction loss with three possi- marks, 96%, 94% and 89% respectively. Yet, the ble values: 1, 2 and 3. model seems to struggle at predicting dashes and colons (63% and 39% F1 respectively). By look- ing at Figure 5, we can observe that, as expected, To speed up the hyper-parameter search we used dashes and colons are frequently confused with only 50% of the available training data while keep- commas and full stops, respectively. These marks ing the 50% development data described above. can often be interchanged without loss of meaning. Table 2 reports the results of our baseline against This is further evidence to support the rationale the large models with default hyper-parameters and of some proposed approaches to solve this task the best trial results from O PTUNA. As expected, (Tilk and Alumäe, 2015; Che et al., 2016; Guer- from our table, we can observe that the biggest reiro et al., 2021), in which dashes and colons tend improvement comes from using XLM-R large in to be aggregated with the commas and full stops replacement of the base model. We can also ob- labels, respectively. serve that further hyper-parameter tuning helps es- pecially in terms of the SER. 7 Conclusions and future work Figure 4 shows that the best results were We have described a multilingual model that is achieved by keeping the encoder learning rate low able to simultaneously detect sentence boundaries, with a high layerwise decay (above 0.9). The learn- and to predict 5 different punctuation marks over ing rate for the classification heads is almost 10× 4 different languages (English, German, French higher than the encoder learning rate. Finally, the and Italian). The model was adapted from (Guer- weight of the punctuation prediction loss is set to reiro et al., 2021), and used by the Unbabel & 2× the weight of the binary prediction loss. Table 1 INESC-ID team for the 1st Shared Task on Sen- describes the hyper-parameters used in our baseline tence End and Punctuation Prediction in NLG Text along with our final submission. (SEPP-NLG 2021), achieving one of the top re- sults. The results confirm that it is possible to achieve state-of-the-art results using a single mul- Hyper-parameter Baseline Final submission tilingual model for both tasks and multiple lan- Encoder Model XLM-R (base) XLM-R (large) guages. This result supports what was already Optimizer AdamW AdamW observed in the experiments performed by (Guer- nº frozen epochs 0.1 0.1 Learning rate 5e-05 2.37e-04 reiro et al., 2021). The code used to produce the Encoder Learning Rate 3e-05 2.57e-05 results is publicly available at: https://github. Layerwise Decay 1.0 0.925 com/Unbabel/caption/tree/shared-task. Batch size 12 8 Loss function Cross-Entropy Cross-Entropy In the future, we plan to extend this work to Binary Loss Weight 1 1 include other language families, such as Semitic Punctuation Loss Weight 1 2 and Slavic languages. Moreover, we would like to Dropout 0.1 0.125 FP precision 32 16 extend our setup to be capable of simultaneously solving the capitalization task too. Having one Table 1: Hyper-parameters used in our final submission single multilingual model that is capable of identi- compared with the baseline hyper-parameters from fying sentence boundaries, punctuation marks and Guerreiro et al. (2021). proper capitalization would constitute a major step Development Models SER↓ Binary F1↑ Macro F1↑ Macro x Binary↑ Baseline (Guerreiro et al., 2021) 0.265 0.926 0.399 0.369 XLM-R large (default) 0.243 0.944 0.411 0.388 XLM-R large O PTUNA 0.214 0.944 0.444 0.419 Table 2: Results of our models on the shared task development data. Our baseline model is trained with the exact same setup as the multilingual models from Guerreiro et al. (2021). Then we decided to replace XLM-R base by XLM-R large. Finally to further improve our results we used O PTUNA to search over the hyper-parameters space described in Section 5.3. Note that these experiments were performed using the shared task corpus V1. towards recovering from ASR recognition errors Communication Association, Makuhari, Chiba, and translation errors from MT systems. Japan, September 26-30, 2010, pages 1509–1512. ISCA. Acknowledgments Fernando Batista, Isabel Trancoso, and Nuno J Mamede. 2009. Comparing automatic rich transcrip- This work was supported by national funds through tion for portuguese, spanish and english broadcast FCT, Fundação para a Ciência e a Tecnologia, un- news. In Automatic Speech Recognition and Un- der project UIDB/50021/2020 and by the P2020 derstanding, 2009. ASRU 2009. IEEE Workshop on, Program through projects “Unbabel Scribe” and pages 540–545. IEEE. “MAIA” supervised by ANI under contract num- Doug Beeferman, Adam Berger, and John Lafferty. bers 038510 and 045909 respectively. 1998. Cyberpunc: a lightweight punctuation anno- tation system for speech. ICASSP, pages 689–692. Y. Cai and D. Wang. 2019. Question mark prediction References by BERT. In 2019 Asia-Pacific Signal and Infor- Takuya Akiba, Shotaro Sano, Toshihiko Yanase, mation Processing Association Annual Summit and Takeru Ohta, and Masanori Koyama. 2019. Op- Conference (APSIPA ASC), pages 363–367. tuna: A next-generation hyperparameter optimiza- Roldano Cattoni, Nicola Bertoldi, and Marcello Fed- tion framework. In Proceedings of the 25th erico. 2007. Punctuating confusion networks for ACM SIGKDD International Conference on Knowl- speech translation. In INTERSPEECH 2007, 8th An- edge Discovery & Data Mining, KDD ’19, page nual Conference of the International Speech Com- 2623–2631, New York, NY, USA. Association for munication Association, Antwerp, Belgium, August Computing Machinery. 27-31, 2007, pages 2453–2456. ISCA. F. Batista, D. Caseiro, N. Mamede, and I. Trancoso. Xiaoyin Che, Cheng Wang, Haojin Yang, and 2008. Recovering capitalization and punctuation Christoph Meinel. 2016. Punctuation prediction for marks for automatic speech recognition: Case study unsegmented transcript based on word vector. In for portuguese broadcast news. Speech Commun., Proceedings of the Tenth International Conference 50(10):847–862. on Language Resources and Evaluation (LREC’16), pages 654–658, Portorož, Slovenia. European Lan- Fernando Batista, Diamantino Caseiro, Nuno J. guage Resources Association (ELRA). Mamede, and Isabel Trancoso. 2007. Recovering punctuation marks for automatic speech recognition. H. Christensen, Y. Gotoh, and S. Renals. 2001. Punctu- In INTERSPEECH 2007, 8th Annual Conference of ation annotation using statistical prosody models. In the International Speech Communication Associa- Proc. of the ISCA Workshop on Prosody in Speech tion, Antwerp, Belgium, August 27-31, 2007, pages Recognition and Understanding, pages 35–40. 2153–2156. ISCA. Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Fernando Batista, Helena Moniz, Isabel Trancoso, and Vishrav Chaudhary, Guillaume Wenzek, Francisco Nuno J. Mamede. 2012. Bilingual experiments on Guzmán, Edouard Grave, Myle Ott, Luke Zettle- automatic recovery of capitalization and punctuation moyer, and Veselin Stoyanov. 2020. Unsupervised of automatic speech transcripts. IEEE Transactions cross-lingual representation learning at scale. In on Audio, Speech and Language Processing, Spe- Proceedings of the 58th Annual Meeting of the Asso- cial Issue on New Frontiers in Rich Transcription, ciation for Computational Linguistics, pages 8440– 20(2):474–485. 8451, Online. Association for Computational Lin- guistics. Fernando Batista, Helena Moniz, Isabel Trancoso, Hugo Meinedo, Ana Isabel Mata, and Nuno J. Markus Freitag, George Foster, David Grangier, Viresh Mamede. 2010. Extending the punctuation module Ratnakar, Qijun Tan, and Wolfgang Macherey. 2021. for european portuguese. In INTERSPEECH 2010, Experts, errors, and context: A large-scale study of 11th Annual Conference of the International Speech human evaluation for machine translation. Nuno Miguel Guerreiro, Ricardo Rei, and Fernando Wei Lu and Hwee Tou Ng. 2010. Better punctuation Batista. 2021. Towards better subtitles: A multilin- prediction with dynamic conditional random fields. gual approach for punctuation restoration of speech In Proceedings of the 2010 Conference on Empiri- transcripts. Expert Systems With Applications (un- cal Methods in Natural Language Processing, pages der review). 177–186, Cambridge, MA. Association for Compu- tational Linguistics. Mary Harper, Bonnie Dorr, John Hale, Brian Roark, Ishak Shafran, Matthew Lease, Yang Liu, Matthew K. Makhija, T. Ho, and E. Chng. 2019. Transfer learn- Snover, Lisa Yung, Anna Krasnyanskaya, and Robin ing for punctuation prediction. In 2019 Asia-Pacific Stewart. 2005. Parsing and spoken structural event Signal and Information Processing Association An- detection. In 2005 Johns Hopkins Summer Work- nual Summit and Conference (APSIPA ASC), pages shop Final Report. 268–273. Jing Huang and Geoffrey Zweig. 2002. Maximum en- J. Makhoul, A. Baron, I. Bulyko, L. Nguyen, tropy modeling for punctuation from speech. In Pro- L. Ramshaw, D. Stallard, R. Schwartz, and B. Xiang. ceedings of ICSLP. 2005. The effects of speech recognition and punctu- D. Jones, E. Gibson, W. Shen, N. Granoien, M. Her- ation on information extraction. In INTERSPEECH- zog, D. Reynolds, and C. Weinstein. 2005. Mea- 05, pages 57–60. suring human readability of machine generated text: three case studies in speech recognition and machine Evgeny Matusov, Arne Mauser, and Hermann Ney. translation. In Proc. of the IEEE International Con- 2006. Automatic sentence segmentation and punctu- ference on Acoustics, Speech, and Signal Processing ation prediction for spoken language translation. In (ICASSP ’05), volume 5, pages v/1009–v/1012. International Workshop on Spoken Language Trans- lation, pages 158–165, Kyoto, Japan. J. Kim and P. C. Woodland. 2001. The use of prosody in a combined system for punctuation generation Joanna Mrozinsk, Edward WD Whittaker, Pierre and speech recognition. In Proc. of Eurospeech, Chatain, and Sadaoki Furui. 2006. Automatic sen- pages 2757–2760. tence segmentation of speech for automatic summa- rization. In Proc. of the IEEE International Confer- Ji-Hwan Kim and Philip C. Woodland. 2003. A com- ence on Acoustics, Speech, and Signal Processing bined punctuation generation and speech recogni- (ICASSP ’06). tion system and its performance enhancement using prosody. Speech Communication, 41(4):563 – 577. M. Ostendorf, E. Shriberg, and A. Stolcke. 2005. Hu- man language technology: Opportunities and chal- Seokhwan Kim. 2019. Deep recurrent neural networks lenges. In Proc. of the IEEE International Confer- with layer-wise multi-head attentions for punctua- ence on Acoustics, Speech, and Signal Processing tion restoration. ICASSP 2019 - 2019 IEEE Interna- (ICASSP ’05), Philadelphia. tional Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7280–7284. Mari Ostendorf, Benoit Favre, Ralph Grishman, Dilek Hakkani-Tür, Mary Harper, Dustin Hillard, Julia Ondrej Klejch, Peter Bell, and Steve Renals. 2017. Hirschberg, Heng Ji, Jeremy G. Kahn, Yang Liu, Sequence-to-sequence models for punctuated tran- Sameer Maskey, Evgeny Matusov, Hermann Ney, scription combining lexical and acoustic features. Andrew Rosenberg, Elizabeth Shriberg, Wen Wang, 2017 IEEE International Conference on Acoustics, and Chuck Wooters. 2008. Speech segmentation Speech and Signal Processing (ICASSP), pages and spoken document processing. IEEE Signal Pro- 5700–5704. cessing Magazine, 25(3):59–69. Junwei Liao, Sefik Emre Eskimez, Liyang Lu, Yu Shi, Ming Gong, Linjun Shou, Hong Qu, and Michael Stephan Peitz, Markus Freitag, Arne Mauser, and Her- Zeng. 2020. Improving readability for automatic mann Ney. 2011. Modeling punctuation prediction speech recognition transcription. as machine translation. In International Workshop on Spoken Language Translation, pages 238–245, Yang Liu, Elizabeth Shriberg, Andreas Stolcke, Dustin San Francisco, CA, USA. Hillard, Mari Ostendorf, and Mary Harper. 2006. Enriching speech recognition with automatic detec- Ricardo Rei, Nuno Miguel Guerreiro, and Fernando tion of sentence boundaries and disfluencies. IEEE Batista. 2020. Automatic truecasing of video sub- Transaction on Audio, Speech and Language Pro- titles using bert: A multilingual adaptable approach. cessing, 14(5):1526–1540. In Information Processing and Management of Un- certainty in Knowledge-Based Systems, pages 708– Yang Liu, Elizabeth Shriberg, Andreas Stolcke, Bar- 721, Cham. Springer International Publishing. bara Peskin, Jeremy Ang, Dustin Hillard, Mari Os- tendorf, Marcus Tomalin, Phil Woodland, and Mary Elisabeth Shriberg. 2005. Spontaneous speech: How Harper. 2005. Structural metadata research in the people really talk, and why engineers should care. EARS program. In Proc. of the IEEE International In Proc. of Eurospeech - 9th European Conference Conference on Acoustics, Speech, and Signal Pro- on Speech Communication and Technology (Inter- cessing (ICASSP ’05), Philadelphia, USA. speech 2005), pages 1781 – 1784, Lisbon, Portugal. Ottokar Tilk and Tanel Alumäe. 2015. LSTM for punc- tuation restoration in speech transcripts. In INTER- SPEECH. Ottokar Tilk and Tanel Alumäe. 2016. Bidirectional re- current neural network with attention mechanism for punctuation restoration. In INTERSPEECH, pages 3047–3051. Nicola Ueffing, Maximilian Bisani, and Paul Vozila. 2013. Improved models for automatic punctuation prediction for spoken and written text. In INTER- SPEECH. Jiangyan Yi and Jianhua Tao. 2019. Self-attention based model for punctuation prediction using word and speech embeddings. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7270–7274. Klaus Zechner. 2002. Automatic summarization of open-domain multiparty dialogues in diverse genres. Computational Linguistics, 28(4):447–485.