=Paper=
{{Paper
|id=Vol-2583/0_Preface
|storemode=property
|title=None
|pdfUrl=https://ceur-ws.org/Vol-2583/0_Preface.pdf
|volume=Vol-2583
}}
==None==
<pdf width="1500px">https://ceur-ws.org/Vol-2583/0_Preface.pdf</pdf>
<pre>
 Hugo Gonçalo Oliveira, Livy Real, and Erick Fonseca (Eds.)


             Proceedings of the


     ASSIN 2 Shared Task
Evaluating Semantic Textual Similarity
and Textual Entailment in Portuguese


          Copyright c 2020 for this paper by its authors.
   Use permitted under Creative Commons License Attribution 4.0
                    International (CC BY 4.0).


                                i
Preface


ASSIN 2 is the second edition of the Evaluation of Semantic Similarity and
Textual Inference (Avaliação de Similaridade Semântica e Inferência textual )
in Portuguese, that took place as a parallel event with the STIL conference in
2019. Like its previous edition, it proposed a shared task on Semantic Similarity
and Text Entailment; with the former ranking pairs of sentences from 1 to 5,
and the latter labeling them as either entailment or non-entailment (but not
paraphrases, in contrast with the first edition).
   There are some notable di↵erences between the first and second edition of
the shared task. Concerning the data, a new corpus of 10 thousand sentences
was presented, but instead of text extracted from news articles, it contains much
simpler sentence pairs, modeled after the SICK corpus. With sentences written
on purpose for this task, some linguistic phenomena could be directly controlled.
As a result, a word overlap baseline is not so powerful on ASSIN 2 as it was on
ASSIN 1.
   On the side of systems, we saw a reflection of the recent development of
neural networks. While hand-engineered features and lexical resources are still
useful, pretrained language models proved themselves as very helpful for both
tasks evaluated.
   This volume presents the main findings of the shared task organizers, and
the descriptions of the strategies developed by the participants. With a total of
nine of them, we are happy with the results of ASSIN 2. We leave a new dataset
as a benchmark to evaluate the progress of this area in Portuguese, as well as
the reflections upon its research directions.


   February, 2020


                                                                Erick Fonseca
                                                                     Livy Real
                                                         Hugo Gonçalo Oliveira


                                       ii
Organization

Livy Real                     B2W Digital/Grupo de Linguı́stica Computacional
                              – USP, Brazil
Hugo Gonçalo Oliveira        CISUC / DEI, Universidade de Coimbra, Portugal
Erick Fonseca                 Instituto de Telecomunicações, Lisboa, Portugal


Reviewers

Ana Alves                     CISUC and ISEC, Polytechnic Institute of Coimbra,
                              Portugal
Evandro Fonseca               Stilingue, Brazil
Irene Rodrigues               Laboratório de Informática, Sistemas e Paralelismo
                              (LISP), Departamento de Informática, Universidade
                              de Évora
Jéssica Rodrigues            Department of Computer Science, Federal Univer-
                              sity of São Carlos, Brazil
Lucas Oliveira                Graduate Program on Health Technology (PPGTS),
                              Pontifical Catholic University of Paraná (PUCPR).
                              Curitiba, Brazil
Marco Sobrevilla Cabezudo NILC - Interinstitutional Center for Computational
                              Linguistics, ICMC, Universidade de São Paulo, São
                              Carlos SP, Brazil
Nádia Félix Felipe da Silva Institute of Informatics, Federal University of Goiás,
                              Brazil
Rui Rodrigues                 Centro de Matemática e Aplicações (CMA), FCT,
                              UNL Departamento de Matemática, FCT, UNL
Valeria de Paiva              Samsung Research America, USA


                                        iii
Table of Contents

Organizing the ASSIN 2 Shared Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                              1
   Livy Real, Erick Fonseca, Hugo Gonçalo Oliveira
ASAPPpy: a Python Framework for Portuguese STS . . . . . . . . . . . . . . . . . .                                              14
  José Santos, Ana Alves, Hugo Gonçalo Oliveira

Multilingual Transformer Ensembles for Portuguese Natural Language
Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   27
   Ruan Chaves Rodrigues, Jéssica Rodrigues da Silva, Pedro Vitor Quinta
   de Castro, Nádia Félix Felipe da Silva, Anderson da Silva Soares
IPR: The Semantic Textual Similarity and Recognizing Textual
Entailment systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                39
   Rui Rodrigues, Paula Couto, Irene Rodrigues
NILC at ASSIN 2: Exploring Multilingual Approaches . . . . . . . . . . . . . . . . .                                            49
   Marco A. Sobrevilla Cabezudo, Marcio Inácio, Ana Carolina Rodrigues,
   Edresson Casanova, Rogério Figueredo de Sousa

Incorporating multiple feature groups to a Siamese Neural Network for
Semantic Textual Similarity task in Portuguese texts . . . . . . . . . . . . . . . . . .                                        59
   João Vitor Andrioli de Souza, Lucas Emanuel Silva e Oliveira, Yohan
   Bonescki Gumiel, Deborah Ribeiro Carvalho, Claudia Maria Cabral
   Moro

Multilingual Transformer Ensembles for Portuguese Natural Language
Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   68
   Evandro Fonseca, João Paulo Reis Alvarenga


                                                                 iv

</pre>