<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>March</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Automatic Speech Recognition for Spontaneous and Prepared Speech &amp; Speech Emotion Recognition in Portuguese Shared-Tasks at Propor 2022</article-title>
      </title-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>21</volume>
      <issue>2022</issue>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>About the workshop</title>
      <p>The Automatic Speech Recognition for Spontaneous and Prepared Speech &amp; Speech
Emotion Recognition in Portuguese (SE&amp;R 2022) took place with the 15th edition of the
International Conference on the Computational Processing of Portuguese (PROPOR 2022).
The workshop consisted on two main tracks: Automatic Speech Recognition (ASR) for
spontaneous and prepared speech for Portuguese; and Speech Emotion Recognition (SER)
in Portuguese. Our main objective in proposing this challenge was to promote research in
Portuguese audio processing. Two corpora were proposed for use in the SE&amp;R
shared-tasks: CORAA ASR and CORAA SER. CORAA (Corpus of Annotated Audios). A
focus is given on the Brazilian Portuguese variant. CORAA ASR corpus contains 389 hours
of spontaneous and prepared speech, segmented at utterance level, together with the
respective transcription for each utterance. CORAA SER is a 50 minute corpus for emotion
recognition. All participants received a corpus training set. The test set (ground truth) was
only made available during the workshop.</p>
      <p>This edition of the workshop includes six papers describing the datasets and solutions
submitted in the SE&amp;R shared-tasks:
● Overview of the Automatic Speech Recognition for Spontaneous and Prepared
Speech &amp; Speech Emotion Recognition in Portuguese (S&amp;ER) Shared-tasks at
PROPOR 2022</p>
      <p>Arnaldo Candido Junior, Edresson Casanova, Ricardo Marcacini
● Domain Specific Wav2vec 2.0 Fine-tuning For The SE&amp;R 2022 Challenge</p>
      <p>Alef Iury S. Ferreira, Gustavo dos Reis Oliveira
● Pretrained audio neural networks for Speech emotion recognition in Portuguese</p>
      <p>Marcelo Matheus Gauy, Marcelo Finger
● Transfer Learning and Data Augmentation Techniques applied to Speech Emotion
Recognition in SE&amp;R 2022
Caroline Alves, Bruno Carlotto, Bruno Dias, Anátale Garcia, Bruno Gianesi, Renan
Izaias, Maria Luiza Morais, Paula Oliveira, Vinícius G. Santos, Rafael Sicoli, Flaviane
R. Fernandes Svartman, Sandra Aluisio, Sidney Leal
● Speech Emotion Recognition in Portuguese for SofiaFala: SER SofiaFala</p>
      <p>Alexander Scaranti, Douglas Silva, Fernando Meloni, Alessandra Alaniz
● Transductive Ensemble Learning with Graph Neural Network for Speech Emotion
Recognition</p>
      <p>Eliton L. Scardin Perin and Edson Takashi Matsubara</p>
    </sec>
    <sec id="sec-2">
      <title>Organizing committee</title>
      <p>The SE&amp;R 2022 workshop is promoted by the TaRSila project, which aims to increase
speech datasets for Brazilian Portuguese language, looking to achieve state-of-the-art
results for the following tasks:
(a) automatic speech recognition (ASR) that automatically transcribes speech;
(b) multi-speaker synthesis (TTS) that generates several voices from different speakers;
(c) speaker identification/verification that selects a speaker from a set of predefined
members (speakers seen during the training of the models --- called closed-set scenario
--or in open-set scenario in which the verification occurs with speakers not seen during the
training of the models); and
(d) voice cloning that uses a few minute/second voice dataset to train a voice model with
synthesis methods, which can read any text in the target voice.</p>
      <p>The TaRSila project is part of the Natural Language Processing initiative (NLP2) of the
Center for Artificial Intelligence (C4AI) of the University of São Paulo, sponsored by IBM and
FAPESP.</p>
      <p>The following researchers organized the Shared Task:
● Alessandra Alaniz Macedo, FFCLRP/USP, Brazil (Website Chair)
● Arnaldo Candido Jr., UTFPR, Brazil (Program Chair &amp; Conference Chair)
● Edresson Casanova, ICMC/USP, Brazil (Evaluation Chair)
● Flaviane Romani Fernandes Svartman, FFLCH/USP, Brazil (Program Chair)
● Marcelo Finger, IME/USP, Brazil (Conference Chair)
● Ricardo M. Marcacini, ICMC/USP, Brazil (Publication Chair &amp; Conference Chair)
● Sandra Maria Aluísio, ICMC/USP, Brazil (Website Chair)
Program Committee and External Reviewers
Acknowledgements
The SE&amp;R shared-tasks was supported by Center for Artificial Intelligence (C4AI-USP), with
support by the São Paulo Research Foundation (FAPESP grant #2019/07665-4) and by the
IBM Corporation.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>