<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Organizing the ASSIN 2 Shared Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Livy Real</string-name>
          <email>livyreal@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Erick Fonseca</string-name>
          <email>erick.fonseca@lx.it.pt</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hugo Gon¸calo Oliveira</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>B2W Digital/Grupo de Lingu ́ıstica Computacional - University of Sa ̃o Paulo</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>CISUC, University of Coimbra</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Instituto de Telecomunicac ̧o ̃es</institution>
          ,
          <addr-line>Lisboa</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We describe ASSIN 2, the second edition of a task on the evaluation of Semantic Textual Similarity (STS) and Textual Entailment (RTE) in Portuguese. The ASSIN 2 task uses as dataset a collection of pairs of sentences annotated with human judgments for textual entailment and semantic similarity. Interested teams could participate in either of the tasks (STS or RTE) or in both. Nine teams participated in STS and eight in the RTE. A workshop on this task was collocated with STIL 2019, in Salvador, Brazil. This paper describes the ASSIN 2 task and gives an overview of the participating systems.</p>
      </abstract>
      <kwd-group>
        <kwd>Shared Task</kwd>
        <kwd>Semantic Textual Similarity</kwd>
        <kwd>Recognizing Textual Entailment</kwd>
        <kwd>Natural Language Inference</kwd>
        <kwd>Portuguese</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>4 Assim in Portuguese means ‘in the same way’, so arguably an adequate name for a
similarity task.
adopted in ASSIN 2 ranges between 1 and 5, with 1 meaning that the sentences
are totally di↵erent and 5 that they have virtually the same meaning. The pair
O cachorro esta´ pegando uma bola azul/Uma bola azul est´a sendo pega pelo
cachorro5 is an example of a pair scored with 5, while A menina est´a andando de
cavalo/O menino est´a borrifando as plantas com ´agua 6 is scored 1.</p>
      <p>Recognizing Textual Entailment (RTE), or Natural Language
Inference (NLI), is the task of predicting whether a given text entails another (i.e.,
a premise implies a hypothesis). The entailment relation happens when, from
the premise [A], we can infer that another sentence [B] is also true. That is,
from [A] we can conclude [B]. For the pair [A] Um macaco est´a provocando um
cachorro no zool´ogico/ [B] Um cachorro est´a sendo provocado por um macaco
no zool´ogico7, we say A entails B. While for the pair [A]Um grupo de meninos
em um quintal est´a brincando e um homem esta´ de p´e ao fundo/ [B]Os meninos
jovens est˜ao brincando ao ar livre e o homem est´a sorrindo por perto 8, there is
no entailment relation from A to B9.</p>
      <p>
        We follow the tradition of shared tasks for RTE that can be traced back to
2005 with the first Pascal Challenge [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], targeting RTE in a corpus of 1,367 pairs
annotated for entailment and non-entailment relations. Back then, the best
teams (MITRE and Bar Ilan teams) achieved an accuracy of 0.586. In the next
Pascal Challenges, di↵erent corpora and task designs were tried: paragraphs were
used instead of short sentences (Challenge 3 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]); contradictions were added
to the data (Extended Challenge 3[
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]); non-aligned texts were given to the
participants (Challenges 6 and 7) and, more recently, the task was presented as
multilingual [
        <xref ref-type="bibr" rid="ref22 ref23">22,23</xref>
        ].
      </p>
      <p>
        Regarding STS, shared tasks for English go back to SemEval 2012 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
Recently, in 2017 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], Arabic and Spanish were also included. SemEval 2014 also
included a related task on Compositionality, that put together both Semantic
Relatedness and Textual Entailment [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], which we modeled our dataset after.
For both STS and RTE, this task used the SICK corpus (‘Sentences Involving
Compositional Knowledge’) as its data source, the first corpus in the order of
10,000 sentence pairs annotated for inference.
      </p>
      <p>
        In 2015, SNLI, a corpus with more than 500,000 human-written English
sentences annotated for NLI was released[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and, in 2017, RepEval [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] included
the MultiNLI corpus, with more than 430,000 pairs annotated for NLI, covering
di↵erent textual genres.
5 The dog is catching a blue ball/A blue ball is being caught by the dog.
6 The girl is riding the horse/The boy is spraying the plants with water.
7 A monkey is teasing a dog at the zoo/A dog is being teased by a monkey at the zoo.
8 A group of boys in a backyard are playing and a man is standing in the background/
      </p>
      <p>Young boys are playing outdoors and the man is smiling nearby.
9 One could possibly think there is an entailment relation among these sentences, since
‘meninos’ (boys) are always ‘meninos jovens’ (young boys) and that probably the
man standing would be also smiling. Since it is also equally possible that the man
nearby is not smiling, this pair is considered a non-entailment, that is, it is possible
that the two scenes happens at the same time, but it is not necessary.</p>
      <p>
        When it comes to Portuguese processing, data availability and shared tasks
for semantic processing are still starting to become popular. In 2016, ASSIN [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
was the first shared task for Portuguese STS and RTE. Its dataset included
10,000 pairs of annotated sentences, half in European Portuguese and half in
Brazilian Portuguese. ASSIN 2 follows the goal of ASSIN by o↵ering a new
computational semantic benchmark to the community interested in computational
processing of Portuguese.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Task and Data Design</title>
      <p>When designing ASSIN 2, we considered the previous experience of ASSIN and
made some changes towards an improved task. This section describes the data
used in the ASSIN 2 collection, its annotation process, decisions taken and the
main di↵erences to ASSIN 1. It ends with a brief schedule of ASSIN 2.
2.1</p>
      <sec id="sec-2-1">
        <title>Data Source</title>
        <p>
          The ASSIN 1 dataset is based on news and imposes several linguistic challenges,
such as temporal expressions and reported speech. Following thoughts of the
ASSIN 1 organization [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], we opted to have a corpus specifically created for
the tasks, as SNLI and MultiNLI, and containing only simple facts, as SICK.
Therefore, the ASSIN 2 data was based on SICK-BR [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], a translation and
manual adaptation of SICK [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ], the corpus used in SemEval 2014, Task 1. SICK
is known to be based on captions of pictures and to have fewer complex linguistic
phenomena, which perfectly suits our purposes. Since ASSIN 2 collection is made
upon SICK-BR, it only contains the Brazilian variant of Portuguese.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Data Balancing</title>
        <p>
          Another goal considered in data design was to have a balanced corpus in terms
of RTE labels. Both ASSIN and SICK-BR data are unbalanced, o↵ering many
more neutral pairs than entailments. Even if this is more representative of the
reality of language usage by people, this is undesirable for machine learning
techniques. Since SICK-BR has less than 25% of entailment pairs, we had to create
and annotate more of them. To create such pairs we followed a semi-automated
strategy, starting from entailment SICK-BR pairs and changing synonyms or
removing adverbial or adjectival phrases in those. All generated pairs were
manually revised. We also manually created pairs hoping they would be annotated
as entailments, but trying as much as possible not to introduce artifact bias [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Data Annotation</title>
        <p>All the annotators involved in the ASSIN 2 annotation task have linguistic
training, being them professors, linguistics students or computational linguists. We
would like to express the deepest appreciation to Alissa Canesim
(Universidade Estadual de Ponta Grossa), Amanda Oliveira (Stilingue), Ana Cl´audia
Zandavalle (Stilingue), Beatriz Mastrantonio (Stilingue - Universidade Federal
de Ouro Preto), Carolina Gadelha (Stilingue), Denis Owa (Pontif´ıcia
Universidade Cat´olica de S˜ao Paulo), Evandro Fonseca (Stilingue), Marcos Zandonai
(Universidade do Vale do Rio dos Sinos), Renata Ramisch (Nu´cleo
Interinstitucional de Lingu´ıstica Computacional - Universidade Federal de S˜ao Carlos),
Talia Machado (Universidade Estadual de Ponta Grossa) for taking part of this
task with the single purpose of producing open resources for the community
interested on the computational processing Portuguese. We also thank the Group
of Computational Linguistics from University of S˜ao Paulo for making available
SICK-BR, which served as the base for the ASSIN corpus.</p>
        <p>All the pairs created for ASSIN were annotated by at least four native
speakers of Brazilian Portuguese. The annotation task was conducted using an online
tool prepared for the RTE and STS tasks, the same as in ASSIN 1.</p>
        <p>For the RTE task, only pairs annotated the same way by the majority of
the annotators were actually used in the dataset. It means that at least three
of four annotators agreed on the RTE labels present in ASSIN 2 collection. For
the STS task, the label is the average of the score given by all the annotators.
The final result was a dataset with about 10,000 sentence pairs: 6,500 used
for training, 500 for validation, and 2,448 for test, now available at https:
//sites.google.com/view/assin2/.</p>
        <p>Since we wanted to have a balanced corpus and a sound annotation strategy,
we opted for having only two RTE labels; entailment and non-entailment.
Di↵erently from ASSIN 1, we did not use the paraphrase label because a
paraphrase happens when there is a double-entailment, being, somehow, unnecessary
to annotate a double-entailment with a third label. This was further motivated
by the results of ASSIN 1, where systems showed much diculty to outperform
the proposed baselines, which were the same as in ASSIN 2. For example, no
participant run did better than the RTE baseline in Brazilian Portuguese. Thus, we
decided to pursue a new task design having in mind its utility to the community.</p>
        <p>
          In fact, our original intent was to follow a tradition in inference that pays
attention to contradictions as much as to entailments, as Zaenen et al. [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] and de
Marne↵e et al. [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ], as well as most recent datasets. However, having a soundly
annotated corpus for contradictions is not a trivial task. Firstly, defining
contradictions and having functional guidelines for the phenomenon is a task on
its own. While recent datasets aim to have a “human” perspective of the
phenomenon [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], semanticists and logicians have already pointed out that this lay
perspective on contradictions can lead to much noise on inference annotation10,
especially when considering contradictions’ annotation. For example, the work
of Kalouli et al. [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] shows that almost 50% of the contradictions in the SICK
dataset, around 15% of all the pairs, do not follow the basic ‘logical’
assumption that, if the premise (sentence A) contradicts the hypothesis (sentence B),
the hypothesis (B) must also contradict the premise (A). After all,
contradic10 See Crouch (et al.)-Manning controversy for details on this point [
          <xref ref-type="bibr" rid="ref18 ref28 ref6">28,18,6</xref>
          ].
tions should be symmetric. Secondly, considering that we used SICK-BR as the
base of our dataset, we would have needed to correct all the contradictions that
were already in SICK, following Kalouli et al. [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], that finds many
inconsistencies on contradictions annotation. Another point for excluding contradictions in
ASSIN 2 is that we would also not have a balanced corpus among the labels,
since SICK (and SICK-BR) has less than 1,500 contradictions in a corpus of
10,000 pairs.
Premise Hypothesis RTE STS
O cachorro castanho est´a correndo Um cachorro castanho est´a cor- Entails 5
na neve rendo na neve
Alguns animais est˜ao brincando Alguns animais est˜ao brincando na Entails 4.4
selvagemente na a´gua a´gua
Dois meninos jovens esta˜o olhando Duas jovens meninas esta˜o o- None 3.7
para a caˆmera e um esta´ pondo sua lhando uma caˆmera e uma esta´
l´ıngua para fora com a l´ıngua para fora
A menina jovem est´a soprando N˜ao tem nenhuma menina de rosa None 2.1
uma bolha que ´e grande girando uma fita
Um avi˜ao est´a voando Um cachorro est´a latindo None 1
        </p>
        <p>&lt;pair entailment="Entailment" id="681" similarity="5"&gt;
&lt;t&gt;O cachorro castanho est´a correndo na neve&lt;/t&gt;
&lt;h&gt;Um cachorro castanho esta´ correndo na neve&lt;/h&gt;
&lt;/pair&gt;
ASSIN 2 was announced on May 2019, in several NLP mailing lists. On June
2019, a Google Group was created for communication between the
organization and participants or other interested people (https://groups.google.com/
forum/#!forum/assin2). Training and validation data were released on 16th
June and testing data on 16th September, which also marked the beginning of
the evaluation period. The deadline for result submission was 10 days later, on
26th September, and the ocial results were announced a few days after this.</p>
        <p>A physical workshop where ASSIN 2 was presented, as well as some
participations, was held on 15th October 2019, in Salvador, Brazil, collocated with the
STIL 2019 symposium11. On 2nd March 2020, a second opportunity was given
to participants to present their work in the POP2 workshop, in E´vora, Portugal,
collocated with the PROPOR 2020 conference12.
2.5</p>
      </sec>
      <sec id="sec-2-4">
        <title>Metrics</title>
        <p>As it happened in ASSIN 1 and in other shared tasks with the same goal, systems’
performance on the RTE task was measured with the macro F1 of precision and
recall as the main metric. For STS, performance was measured with the Pearson
correlation index (⇢ ) between the gold and the submitted scores, with Mean
Squared Error (MSE) computed as a secondary metric. The evaluation scripts
can be found at https://github.com/erickrf/assin.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Participants and Results</title>
      <p>ASSIN 2 had a total of nine participating teams, five from Portugal and four
from Brazil, namely:
– CISUC-ASAPPj (Portugal)
– CISUC-ASAPPy (Portugal)
– Deep Learning Brasil (Brazil)
– IPR (Portugal)
– L2F/INESC (Portugal)
– LIACC (Portugal)
– NILC (Brazil)
– PUCPR (Brazil)
– Stilingue (Brazil)</p>
      <p>Each team could submit up to three runs and participate in both STS and
RTE, or in only one of them. Moreover, each team could participate without
attending the workshop venue, held in Salvador. We believe this was an
important point for increasing participation, because travelling expenses can be
high, especially for those that were coming from Europe. The main drawback
was that only four teams actually presented their approaches in the ASSIN 2
workshop, namely CISUC-ASAPP, CISUC-ASAPPy, Deep Learning Brasil and
Stilingue. On the other hand, a total of six teams submitted a paper describing
their participation, to be included in this volume.
Run
CISUC-ASAPPj
CISUC-ASAPPpy
Deep Learning Brasil
IPR
L2F/INESC
LIACC
NILC
PUCPR
Stilingue
WordOverlap (baseline)
BoW sentence 2 (baseline)
Infernal (baseline)
The results of the runs submitted by each team in the STS and RTE tasks are
shown in Table 2, together with three baselines.</p>
      <p>
        Considering the Pearson correlation (⇢ ), the best result in STS (0.826) was
achieved by the first run submitted by the IPR team, although the best MSE
11 http://comissoes.sbc.org.br/ce-pln/stil2019/
12 https://sites.google.com/view/pop2-propor2020
⇢
was by the second run of Stilingue (0.39). We highlight that these were the only
teams with ⇢ higher than 0.8. Although Pearson ⇢ was used as the main metric,
this metric and the MSE are two di↵erent ways of analysing the results. A high
⇢ means that the ranking of most similar pairs is closer to the one in the gold
standard, while a low MSE means that the similarity scores are closer to the gold
ones. Both the best MSE and the best values of ⇢ are significantly better than
the best results achieved in ASSIN 1, both in the ocial evaluation ( ⇢ =0.73 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ])
and in post-evaluation experiments (⇢ =0.75 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]).
      </p>
      <p>On RTE, Deep Learning Brasil had the best run (second run), considering
both F1 and Accuracy, though not very far from IPR, Stilingue and NILC.
Again, the values achieved are higher than the best ocial results in ASSIN 1.</p>
      <p>The globally higher performances suggest that, when compared to ASSIN 1,
ASSIN 2 was an easier task. This might, indeed, be true, especially considering
that, for RTE, the ASSIN 2 collection only used two labels, due to Paraphrases
being labelled as Entailment and thus not “competing”. ASSIN 2 data was also
aimed to be easier and not having complex linguistic phenomena. Another point
to keep in mind when comparing ASSIN 1 and ASSIN 2 is that in this edition,
competitors had access to a balanced corpus. This might have also contributed
to the better performance of systems in ASSIN 2 data. Still, we should also
consider that, in the last two years, NLP had substantial advances when it comes
to the representation of sentences and their meaning, which lead to significant
improvements in many tasks.
3.2</p>
      <sec id="sec-3-1">
        <title>Approaches</title>
        <p>
          Approaches followed by the participants show that the Portuguese NLP
community is quickly adopting the most recent trends, with several teams (IPR,
Deep Learning Brasil, L2F/INESC, Stilingue and NILC), including those with
the best results, somehow exploring BERT [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] contextual embeddings, some
of which (IPR, NILC) fine-tuned for ASSIN 2. Some teams combined the
previous with other features commonly used in STS / RTE, including string
similarity measures (e.g., Jaccard for tokens, token n-grams and character
ngrams), agreement in negation and sentiment, lexical-semantic relations
(synonyms and hyponymy), as well as pre-trained classic word embeddings (e.g.,
word2vec, GloVe, fastText, all available for Portuguese as part of the NILC
embeddings [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]). Besides BERT, non-pretrained neural models, namely LSTM
Siamese Networks (PUCPR) and Transformers (Stilingue), were also used, while
a few teams (ASAPPpy, ASAPPj) followed a more classic machine learning
approach, and learned a regressor from some of the previous features. Models were
trained not only in the ASSIN 2 train collection, but also in data from ASSIN 1.
        </p>
        <p>Towards the best Pearson ⇢ , the IPR team relied on a pre-trained multilingual
BERT model, freely available by the developers of BERT, which they fine-tuned
with large Portuguese corpora. A neural network was built by adding one layer
to the resulting BERT model and trained with ASSIN 1 (train and test) and
ASSIN 2 (train) data.</p>
        <p>
          Stilingue relied on the exploration of Transformers, trained with BERT [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]
features, plus a set of 18 additional features covering sentiment and
negation agreement, synonyms and hyponyms according to Onto.PT [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] and
VerbNet [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ], similarity, gender and number agreement, Jaccard similarity of shared
tokens, verb tense, presence of the conjunction ‘e’ (and), similar and di↵erent
tokens, sentence subject, and cosine of sentence embeddings computed with
FastText [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>
          For RTE, the best run, by Deep Learning Brasil, was based on an ensemble of
multilingual BERT and RoBERTa [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], which improves on the results of BERT,
both fine-tuned for the ASSIN 2 data. However, for RoBERTa, this data was
previously translated to English, with Google Translate. The IPR team also
relied on BERT and used ASSIN 1 data to fine-tune the model.
        </p>
        <p>
          Our first baseline was the word overlap, which had very competitive results
in ASSIN 1. It counts the ratio of overlapping tokens in both the first and
second sentence, and trains a logistic/linear regressor (for RTE/STS) with these
two features. A second baseline is inspired by Gururangan et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] and trains
the same algorithms on bag-of-words features extracted only from the second
sentence of each pair. It aims to detect biases in the construction of the dataset.
For RTE, a third baseline was considered, namely, Infernal [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], a system based
on hand designed features, which has state-of-the-art results on ASSIN 1.
3.3
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Results in Harder Pairs</title>
        <p>
          Similar to Gururangan et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], we took all the pairs misclassified by our
second baseline and called them a hard subset of the data. In other words, these
pairs were not correctly classified by only looking at the hypothesis, the second
sentence of the pair. In order to provide an alternative view on the results, we
analysed the participants’ results on this subset.
        </p>
        <p>
          Results are shown in table 3. Though worse than the performance in the
full collection, in table 2, the di↵erences are not as large as those reported by
Gururangan et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. This is not surprising, as the second baseline had an F1
score only marginally above chance level13, indicating that the dataset does not
su↵er from annotation artifacts as seriously as SNLI.
        </p>
        <p>A particular outlier is the second run of IPR in STS. But the highly di↵ering
value is due to their already very low Pearson ⇢ in the original data. Still, eight
runs had a decrease of more than 15% in RTE, suggesting they might have been
exploiting some bias in the collection.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>
        ASSIN 2 was the second edition of ASSIN, a shared task targeting Recognizing
Textual Entailment / Natural Language Inference and Semantic Textual
Similarity in Portuguese. It had nine participating teams, from Portugal and Brazil,
13 For comparison, Gururangan et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] had 67% accuracy in a dataset with three
classes.
      </p>
      <p>RTE
Acc
ASAPPj
ASAPPpy
IPR
LIACC
NILC
PUCPR
L2F/INESC</p>
      <p>1
Deep Learning Brasil 2</p>
      <p>
        3
Stilingue
and, di↵erently from the previous ASSIN edition [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], most of the systems
outperformed the proposed baselines. We believe that the e↵ort of having a simpler
task in ASSIN 2 was beneficial, not only because systems could do better in this
edition, but also because the ASSIN 2 corpus has a sound annotation strategy,
comparable with previous shared tasks for English. Looking at the participation,
it seems that the Portuguese processing community is now more interested in
the proposed tasks.
      </p>
      <p>On the results achieved, it is notable that systems based on transfer learning
had better results in the competition for both tasks. A note should be added on
the Deep Learning Brasil team, which achieved the best scores for RTE with a
strategy based on translating the data to English, to make possible the use of
more powerful models. However, it is possible that the nature of the data, which
is a translated and adapted version of SICK, makes this strategy more sound
than it would be in real-world scenarios. After all, ASSIN 2 results may indicate
how the pre-trained language models used, namely BERT and RoBERTa, rapidly
improved the state-of-the-art of a given task. For the future, we would like to
discuss new ways of evaluating the generalization power of the proposed systems,
since intrinsic metrics, considering only a subset of the data that follows exactly
the same format of the training data, seems nowadays not to be enough to
e↵ectively evaluate the systems’ performance.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agirre</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez-Agirre</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>SemEval-2012 task 6: A pilot on semantic textual similarity</article-title>
          .
          <source>In: Proc. 1st Joint Conf. on Lexical and Computational Semantics-Vol. 1: Proc. of main conference and shared task, and Vol. 2: Proc. of Sixth Intl. Workshop on Semantic Evaluation</source>
          . pp.
          <fpage>385</fpage>
          -
          <lpage>393</lpage>
          . Association for Computational Linguistics (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Alves</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Gonc¸alo Oliveira, H.,
          <string-name>
            <surname>Rodrigues</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <source>Encarnac¸a˜o, R.: ASAPP 2</source>
          .
          <article-title>0: Advancing the state-of-the-art of semantic textual similarity for Portuguese</article-title>
          .
          <source>In: Proceedings of 7th Symposium on Languages, Applications and Technologies (SLATE</source>
          <year>2018</year>
          ).
          <source>OASIcs</source>
          , vol.
          <volume>62</volume>
          , pp.
          <volume>12</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          :
          <fpage>17</fpage>
          .
          <string-name>
            <surname>Schloss</surname>
          </string-name>
          Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (
          <year>June 2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bowman</surname>
            ,
            <given-names>S.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Angeli</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potts</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.:</given-names>
          </string-name>
          <article-title>A large annotated corpus for learning natural language inference</article-title>
          .
          <source>In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <fpage>632</fpage>
          -
          <lpage>642</lpage>
          . Association for Computational Linguistics, Lisbon, Portugal (Sep
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bowman</surname>
            ,
            <given-names>S.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Angeli</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Potts</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.:</given-names>
          </string-name>
          <article-title>A large annotated corpus for learning natural language inference</article-title>
          .
          <source>In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agirre</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopez-Gazpio</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Specia</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Semeval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation</article-title>
          .
          <source>In: Proceedings of 11th International Workshop on Semantic Evaluation (SemEval2017)</source>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          . Association for Computational Linguistics (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Crouch</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karttunen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaenen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Circumscribing is not excluding: A response to manning</article-title>
          . http://web.stanford.edu/ laurik/publications/reply-to-manning.pdf
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Dagan</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glickman</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Magnini</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>The pascal recognising textual entailment challenge</article-title>
          .
          <source>Machine Learning Challenges. Evaluating Predictive Uncertainty</source>
          ,
          <article-title>Visual Object Classification, and Recognizing Textual Entailment</article-title>
          . (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In: Proc 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . Association for Computational Linguistics, Minneapolis,
          <source>Minnesota (Jun</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Fialho</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marques</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martins</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Coheur</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Quaresma</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>INESC-ID@ASSIN: Medic¸a˜o de similaridade semaˆntica e reconhecimento de inferˆencia textual</article-title>
          .
          <source>Linguama´tica 8(2)</source>
          ,
          <fpage>33</fpage>
          -
          <lpage>42</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Fonseca</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , Alu´ısio, S.M.:
          <article-title>Syntactic knowledge for natural language inference in Portuguese</article-title>
          . In: Villavicencio,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Moreira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Abad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Caseli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Gamallo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Ramisch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            , Gon¸calo Oliveira, H.,
            <surname>Paetzold</surname>
          </string-name>
          , G.H. (eds.)
          <source>Computational Processing of the Portuguese Language</source>
          . pp.
          <fpage>242</fpage>
          -
          <lpage>252</lpage>
          . Springer, Cham (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Fonseca</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Criscuolo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Alu´ısio, S.:
          <article-title>Vis˜ao geral da avaliac¸a˜o de similaridade semˆantica e inferˆencia textual</article-title>
          .
          <source>Linguama´tica 8(2)</source>
          ,
          <fpage>3</fpage>
          -
          <lpage>13</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Giampiccolo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Magnini</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dagan</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dolan</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>The third PASCAL recognizing textual entailment challenge</article-title>
          .
          <source>In: Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing</source>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          . Association for Computational Linguistics, Prague (Jun
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. Gon¸calo Oliveira, H.,
          <string-name>
            <surname>Gomes</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>ECO and Onto</article-title>
          .PT:
          <article-title>A flexible approach for creating a Portuguese wordnet automatically</article-title>
          .
          <source>Language Resources and Evaluation</source>
          <volume>48</volume>
          (
          <issue>2</issue>
          ),
          <fpage>373</fpage>
          -
          <lpage>393</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Gururangan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Swayamdipta</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwartz</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bowman</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N.A.</given-names>
          </string-name>
          :
          <article-title>Annotation artifacts in natural language inference data</article-title>
          .
          <source>In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>2</volume>
          (Short Papers). pp.
          <fpage>107</fpage>
          -
          <lpage>112</lpage>
          . Association for Computational Linguistics, New Orleans,
          <source>Louisiana (Jun</source>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Hartmann</surname>
            ,
            <given-names>N.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fonseca</surname>
            ,
            <given-names>E.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shulby</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Treviso</surname>
            ,
            <given-names>M.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodrigues</surname>
            ,
            <given-names>J.S.</given-names>
          </string-name>
          , Alu´ısio, S.M.:
          <article-title>Portuguese word embeddings: Evaluating on word analogies and natural language tasks</article-title>
          .
          <source>In: Proceedings the 11th Brazilian Symposium in Information and Human Language Technology. STIL</source>
          <year>2017</year>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Kalouli</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Real</surname>
          </string-name>
          , L.,
          <string-name>
            <surname>de Paiva</surname>
          </string-name>
          , V.:
          <article-title>Correcting contradictions</article-title>
          .
          <source>In: Proceedings of Computing Natural Language Inference (CONLI) Workshop</source>
          , 19
          <year>September 2017</year>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ott</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Levy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoyanov</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          . arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>11692</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Local textual inference: It's hard to circumscribe, but you know it when you see it - and nlp needs it</article-title>
          . https://nlp.stanford.edu/ manning/papers/LocalTextualInference.pdf (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Marelli</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bentivogli</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baroni</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernardi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Menini</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zamparelli</surname>
          </string-name>
          , R.: SemEval
          <article-title>-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment</article-title>
          .
          <source>In: Proc. of 8th Intl. Workshop on Semantic Evaluation (SemEval</source>
          <year>2014</year>
          ). pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . Association for Computational Linguistics, Dublin, Ireland (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20. de Marne↵e,
          <string-name>
            <surname>M.C.</surname>
          </string-name>
          , Ra↵erty,
          <string-name>
            <given-names>A.N.</given-names>
            ,
            <surname>Manning</surname>
          </string-name>
          , C.D.:
          <article-title>Finding contradictions in text</article-title>
          .
          <source>In: Proceedings of ACL-08: HLT</source>
          . pp.
          <fpage>1039</fpage>
          -
          <lpage>1047</lpage>
          . Association for Computational Linguistics, Columbus,
          <source>Ohio (Jun</source>
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Nangia</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lazaridou</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bowman</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>The RepEval 2017 shared task: Multi-genre Natural Language Inference with sentence representations</article-title>
          .
          <source>In: Proceedings of the 2nd Workshop on Evaluating Vector Space Representations for NLP</source>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          . Association for Computational Linguistics, Copenhagen, Denmark (Sep
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Negri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marchetti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mehdad</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bentivogli</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giampiccolo</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>Semeval2012 task 8: Cross-lingual textual entailment for content synchronization</article-title>
          .
          <source>In: Proceedings of *SEM</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Negri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marchetti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mehdad</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bentivogli</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giampiccolo</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>Semeval2013 task 8: Cross-lingual textual entailment for content synchronization</article-title>
          .
          <source>In: Proceedings of *SEM</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Real</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fonseca</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oliveira</surname>
            ,
            <given-names>H.G.</given-names>
          </string-name>
          :
          <article-title>The assin 2 shared task: a quick overview</article-title>
          .
          <source>In: Computational Processing of the Portuguese Language - 13th International Conference, PROPOR</source>
          <year>2020</year>
          , E´vora, Portugal, March 2-
          <issue>4</issue>
          ,
          <year>2020</year>
          , Proceedings. p.
          <source>in press. LNCS</source>
          , Springer (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Real</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodrigues</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vieira</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Albiero</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thalenberg</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guide</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Silva</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Oliveira Lima</surname>
            , G.,
            <given-names>C. S.</given-names>
          </string-name>
          <string-name>
            <surname>Caˆmara</surname>
            , I., Stanojevi´c,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Souza</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Paiva</surname>
          </string-name>
          , V.:
          <article-title>SICK-BR: A Portuguese corpus for inference</article-title>
          .
          <source>In: Proceedings of 13th PROPOR</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Scarton</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aluısio</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Towards a cross-linguistic verbnet-style lexicon for brazilian portuguese</article-title>
          .
          <source>In: Proceedings of LREC 2012 Workshop on Creating Crosslanguage Resources for Disconnected Languages and Styles</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Voorhees</surname>
            ,
            <given-names>E.M.</given-names>
          </string-name>
          :
          <article-title>Contradictions and justifications: Extensions to the textual entailment task</article-title>
          .
          <source>In: Proceedings of ACL-08: HLT</source>
          . pp.
          <fpage>63</fpage>
          -
          <lpage>71</lpage>
          . Association for Computational Linguistics, Columbus,
          <source>Ohio (Jun</source>
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Zaenen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karttunen</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crouch</surname>
          </string-name>
          , R.:
          <article-title>Local textual inference: Can it be defined or circumscribed?</article-title>
          <source>In: Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment</source>
          . pp.
          <fpage>31</fpage>
          -
          <lpage>36</lpage>
          . Association for Computational Linguistics, Ann Arbor,
          <source>Michigan (Jun</source>
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>