<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Wide And Deep Transformers Applied to Semantic Relatedness and Textual Entailment</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Evandro Fonseca</string-name>
          <email>evandro@stilingue.com.br</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>João Paulo Reis Alvarenga</string-name>
          <email>joaopaulo@stilingue.com.br</email>
        </contrib>
      </contrib-group>
      <fpage>68</fpage>
      <lpage>76</lpage>
      <abstract>
        <p>In this paper we present our approach to deal with semantic relatedness and textual entailment, two tasks proposed in ASSIN-2 (Second evaluation of semantic relatedness and textual entailment). We develop 18 features that explore lexical, syntactic and semantic information. To train the models we applied both supervised machine learning and an architecture based in Wide and Deep learning. Our proposal demonstrated to be competitive with the current state-of art models and with other participant models for Portuguese, mainly when the mean square error is considered.</p>
      </abstract>
      <kwd-group>
        <kwd>Semantic Relatedness</kwd>
        <kwd>Textual Entailment</kwd>
        <kwd>Natural Language Processing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        great challenges, it is because depends of many processing levels, such as:
Part-ofspeech tagging, Sentiment Analysis, Coreference Resolution, among others. Plus,
when we deal with less resourceful languages like Portuguese, these challenges
are even greater, due to lack of dense semantic bases, such as YAGO[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and
FrameNet[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The paper is organized as follows: Section 2 presents related work;
in Section 3 we describe our proposed models; in Section 4 we show a summary
of the official ASSIN-2[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] results; Section 5 presents an error analysis; in Section
6 the conclusions and future work are presented.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Transfer learning technique is widely used by many NLP tasks, such as Sentiment
Analysis [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], Text Classification [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], Question Answering [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] among others.
The reason of that is clear. Transfer learning models may improve significantly
NLP models [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. For SR and TE tasks it is not different. In 2018 Devlin et al.
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] has proposed an approach based on transfer learning (BERT) to solve SR
and TE tasks for English. They achieved 0.865 of Pearson for SR and 70.1%
of F1 for TE in GLUE Benchmark [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. In 2019 some works based on BERT
architecture were emerged (also for English), such as: RoBERTa [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], whose
results were 0.922 of Person for SR and 88.2% of F1 for TE; and ALBERT
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], with 0.925 of Pearson for SR and 89.2% of F1 for TE. The current state
of art for Portuguese is Fonseca’s work [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Fonseca has proposed the use of
neural networks and syntactic trees distance to solve SR and TE tasks and as a
result his model has achieving 0.577 of Pearson for SR and 74.2% of F1 for TE
using ASSIN-2 corpus. In this competition we called his model of baseline. In
our approach we propose the use of a machine learning architecture named Wide
And Deep. Wide and Deep architecture consists of unifying handcrafted features
with dense features. Cheng et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] has proposed the use of Wide and Deep
to deal with recommender systems and their results were encouraging. Plus, we
believe that handcrafted features, based in linguistic knowledge, NLP techniques
and in a corpora study may outperform pure deep learning features. However,
when we apply handcrafted features only the results may be not so good. To
show that, we train and test models using both Wide and Deep architecture
and the traditional supervised machine learning architecture. The result shows
that Wide and Deep architecture may outperforms significantly the traditional
machine learning architecture with handcrafted features.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Proposed model</title>
      <p>
        To address the semantic relatedness and textual entailment problem we propose
eighteen features, which consists of exploring the lexical, syntactic and semantic
information. Besides, we use Wide and Deep Transformer architecture, which
mix our proposed features and deep learning features. In subsection 3.1 we show
our set of propose features. Our set of features is based on some related works[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
and also is empirically designed, through a case study based in ASSIN-2 training
set2.
3.1
      </p>
      <p>
        Features
1. Sentiment Agreement: returns true when both sentences agree in the
sentiment polarity[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and false otherwise (as in below example).
      </p>
      <p>
        – O animal está comendo – The animal is eating (+)
– O animal está mordendo uma pessoa – The animal is biting a person (-)
2. Negation Agreement: returns true when the both sentences agree in the
co-occurrence of negative terms3 or expressions, such as: "jamais", "nada",
"nenhum", "ninguém", "nunca", "não", among others. This feature is very
relevant for textual entailment. It helps in cases such as:
– O menino está pulando – The boy is jumping
– Ninguém está pulando – Nobody is jumping
3. Synonym: returns the quantity of synonyms between the two sentences. To
identify it we use Onto.PT [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. This feature helps to improve the semantic
relatedness process. It is because synonyms are used to refer to a same entity,
as in below example:
– Um garoto está fazendo um discurso – A young man is giving a speech
– Um menino está falando – A boy is talking
4. Hyponym: returns the quantity of hyponyms between the two sentences. As
in the Synonymy feature, we use Onto.PT to identify the semantic relations.
5. Verb Similarity: returns the number of similar verbs between two
sentences. To recognize it, Onto.PT and VerbNet.Br[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] were used. It helps to
identify pairs such as:
– Uma menina está caminhando – A girl is stepping
– Uma menina está andando – A girl is walking
6. Nouns Similarity: returns the quantity of similar nouns between two
sentences. Here we use synonymy relation provided by Onto.PT and the lexical
similarity(when two words is exactly equals).
      </p>
      <p>– O garoto está em casa – The young man is in home
– O menino está em casa – The boy is in home
7. Adjectives Similarity: returns the quantity of similar adjectives between
two sentences. As in Nouns Similarity, we use synonymy relation and the
lexical similarity, but considers just adjectives.
8. Gender: returns the number of tokens that agree in gender (male/female).
2 available in: https://sites.google.com/view/assin2/
3 never, nothing, no, nobody, no one, never,...
9. Number: returns the number of tokens that agree in number (singular/plural).</p>
      <p>
        To identify number and gender features we use SNLP4
10. Jaccard Similarity: returns a real number, containing the Jaccard[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
similarity between two sentences. Here we perform a preprocessing: firstly we
remove determinants5; second we sort the tokens alphabetically6; and, finally,
we calculate the Jaccard similarity. Basically, each sentence is modified as in
the follow example:
– A mulher está cortando cebola! cebola cortando está mulher
– The woman is cutting onion! cutting is onion woman
11. Verb+Participle: returns true when both the sentences have a verb+participle
construction„ which do not necessarily have to be equal, as in:
– O urso está sentado – The bear is sitting
– O urso está deitado – The bear is lying down
12. Verb+Participle+Equals: returns true when both the sentences have the
same verb+participle construction, as in:
– O urso está sentado – The bear is sitting
– O urso está sentado – The bear is sitting
13. Conjunction_E_A: returns true when the sentence "A" has the "e"(and)
conjunction, which helps in cases such as:
– Um menino e uma menina estão caminhando – A boy and a girl are
walking
– Duas pessoas estão andando – Two people are walking
14. Conjunction_E_B: the same as Conjunction_E_A, but for sentence B.
15. TokensDif: calculates the difference in the amount of tokens between the
sentences "A" and "B". It does not consider determinants. In the below
example, TokensDif returns 2, because sentence A has six tokens and sentence
B has four tokens7;
– Uma mulher não está fritando algum alimento – A woman is not
frying any food
– Uma mulher está fritando comida – A woman is frying food
16. Same Word: returns an integer value, containing the number of exactly
equal words in the sentences(common words). Here, we consider just verbs,
nouns and adjectives and apply just lexical match.
17. Same Subject: returns true when the sentences has the same subject.
4 Stilingue proprietary software.
5 Although determinants may change a referent, in ASSIN-2 shared-task there is an
agreement that consists of considering, for example "the girl" and "a girl" the same
entity.
6 it is because, to calc Jaccard we want to consider just the tokens, not its sequence
in the sentence.
7 determinants are not considered
18. Cosine Similarity: returns the cosine similarity8 of two sentences. Here we
use FastText Skip-Gram 300d built by NILC9 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
3.2
      </p>
      <p>
        Model Set Up and Runs
In the shared task, each participant was encouraged to submit three output
files. Each output file could have results of one or the two proposed tasks. We
performed experiments using three distinct configurations to produce the models.
For the first model we use the traditional supervised machine learning. Basically
we train a model using Random Forest[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and our set of proposed features.
For the second and third models the Wide And Deep architecture was used.
For that, we use BERT-Base multilingual [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], Universal Sentence Encoder-Large
multilingual[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and our set of proposed features. In table 1 we detail the set up
of each model.
      </p>
      <p>Basically, in the first run we use just Random Forest and our set of features.
We tested some other traditional supervised machine learning algorithms, such
as Multilayer Perceptron, Linear Regression, Naive Bayes, Decision Table, J48,
Random Tree, among others. However, Random Forest easily outperforms all
of then. In the second and third runs we use Wide and Deep architecture. We
can see that the model three was used in second and third run. It is because
in our tests we have not found a model that outperforms Bert-Base for textual
entailment task.
8 We calc Cosine Similarity considering averaged word vectors of each sentence
9 http://nilc.icmc.usp.br/embeddings</p>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>In table 3 we show results10 of ASSIN-2 shared-task for the two proposed tasks.
There is a great distance between Wide and Deep architecture and the
traditional supervised machine learning. Regarding our model and the best
models(winners), our models presents very close results. Basically, our model achieved
1.7 points less in F1 and 1.68 less accuracy for TE task. Regarding SR task, our
model presented 0.009 less Pearson coefficient (for running 3) and 0.026 for
running 2. However, we can see that out model presents better mean squared error
(MSE). It is known that the MSE penalizes outliers. Thus, we can say that our
model is more linear than others. An error analysis is presented in Section 5.</p>
      <p>
        IPR
Deep Learning
Brasil
Baseline [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
In table 4 it is possible to see that there are over 1550 pairs with a range very
near of the gold samples (ranges between 0 and 0.4); for 0.5 to 0.9 there are
618 pairs. It is important to say that this difference is acceptable. It is because,
even in the annotation process, many human annotators disagree on this range.
We also can see that for all 2448 pairs of test corpus, there is just one sample
with a range above of 3.0. In this pair there are many equal words, however they
refer to distinct facts. For the example below, our model has predicted 4.5 of
similarity while gold is 1.5.
      </p>
      <p>– um cachorro preto e um branco estão correndo alegremente na grama – a
black and a white dog are running happily in the grass
– uma pessoa negra vestindo branco está correndo alegremente com o cachorro
na grama – a black person wearing white is running happily with the dog on
the grass
10 for baseline model we unify its runs and shows only its better results</p>
      <p>We identify that there is a limitation in our model. It is because our
synonymy feature just consider single words. The second main error found is when
a sentence "A" has verb + participle construction and the sentence "B" has
gerund and vice-versa, such as:
– O pelo de um gato está sendo penteado por uma garota – A cat’s fur is
being combed by a girl
– Uma pessoa está penteando o pelo de um gato – A person is combing a
cat’s fur
– O cara está comendo uma banana – The guy is eating a banana
– Uma banana está sendo comida por um cara – A banana is being eaten by
a guy
6</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Future Work</title>
      <p>
        In this paper we presented our models to deal with two important tasks, Semantic
Relatedness and Textual Entailment. Our models were based in 18 features, that
cover natural language patterns and Wide and Deep architecture. The latter
explores the mix between our linguistic features and deep learning features. As
results we show that our models can be competitive. Plus, although the MSE is
not the official metric, we believe that our model built for semantic relatedness
task provides a good solution for the proposed task, mainly when we need a
more reliable model, with less outliers. As future work we want to improve our
semantic features, in order to recognizes referential expressions. We also intend
to explore ConceptNet [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] e BabelNet [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] to provide a more robust semantic
knowledge to our models.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>L. V.</given-names>
            <surname>Avanço</surname>
          </string-name>
          and
          <string-name>
            <given-names>M. d. G. V.</given-names>
            <surname>Nunes</surname>
          </string-name>
          .
          <article-title>Lexicon-based sentiment analysis for reviews of products in Brazilian Portuguese</article-title>
          .
          <source>In 2014 Brazilian Conference on Intelligent Systems</source>
          , pages
          <fpage>277</fpage>
          -
          <lpage>281</lpage>
          . IEEE,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>C. F.</given-names>
            <surname>Baker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Fillmore</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. B.</given-names>
            <surname>Lowe</surname>
          </string-name>
          .
          <article-title>The Berkeley Framenet project</article-title>
          .
          <source>In Proceedings of the 17th International Conference on Computational Linguistics</source>
          , pages
          <fpage>86</fpage>
          -
          <lpage>90</lpage>
          , Quebec, Canada,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Bouckaert</surname>
          </string-name>
          , E. Frank,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kirkby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Reutemann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Seewald</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Scuse</surname>
          </string-name>
          .
          <source>Weka manual for version 3-7-8</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D.</given-names>
            <surname>Cer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Hua</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Limtiaco</surname>
          </string-name>
          ,
          <string-name>
            R. S. John,
            <given-names>N.</given-names>
            <surname>Constant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Guajardo-Cespedes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Tar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Sung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Strope</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Kurzweil</surname>
          </string-name>
          .
          <article-title>Universal sentence encoder</article-title>
          .
          <source>CoRR</source>
          , abs/
          <year>1803</year>
          .11175,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. H.-T. Cheng, L. Koc,
          <string-name>
            <given-names>J.</given-names>
            <surname>Harmsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Shaked</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Chandra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Aradhye</surname>
          </string-name>
          , G. Anderson,
          <string-name>
            <given-names>G.</given-names>
            <surname>Corrado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ispir</surname>
          </string-name>
          , et al.
          <article-title>Wide &amp; deep learning for recommender systems</article-title>
          .
          <source>In Proceedings of the 1st workshop on deep learning for recommender systems</source>
          , pages
          <fpage>7</fpage>
          -
          <lpage>10</lpage>
          . ACM,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          . BERT:
          <article-title>pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>CoRR</source>
          , abs/
          <year>1810</year>
          .04805,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>E. R.</given-names>
            <surname>Fonseca</surname>
          </string-name>
          .
          <article-title>Reconhecimento de implicação textual em português</article-title>
          .
          <source>PhD thesis</source>
          , Universidade de São Paulo,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>H. Gonçalo</given-names>
            <surname>Oliveira</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Gomes</surname>
          </string-name>
          .
          <article-title>ECO and Onto</article-title>
          .PT:
          <article-title>A flexible approach for creating a Portuguese wordnet automatically</article-title>
          .
          <source>Language Resources and Evaluation</source>
          ,
          <volume>48</volume>
          (
          <issue>2</issue>
          ):
          <fpage>373</fpage>
          -
          <lpage>393</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>N.</given-names>
            <surname>Hartmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Fonseca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Shulby</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Treviso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rodrigues</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Aluisio</surname>
          </string-name>
          .
          <article-title>Portuguese word embeddings: Evaluating on word analogies and natural language tasks</article-title>
          .
          <source>arXiv preprint arXiv:1708.06025</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>J.</given-names>
            <surname>Howard</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruder</surname>
          </string-name>
          .
          <article-title>Fine-tuned language models for text classification</article-title>
          .
          <source>CoRR</source>
          , abs/
          <year>1801</year>
          .06146,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Goodman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Gimpel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Sharma</surname>
          </string-name>
          , and
          <string-name>
            <surname>R. Soricut. ALBERT:</surname>
          </string-name>
          <article-title>A lite BERT for self-supervised learning of language representations</article-title>
          .
          <source>CoRR</source>
          , abs/
          <year>1909</year>
          .11942,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>M.</given-names>
            <surname>Levandowsky</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Winter</surname>
          </string-name>
          .
          <article-title>Distance between sets</article-title>
          .
          <source>Nature</source>
          ,
          <volume>234</volume>
          (
          <issue>5323</issue>
          ):
          <fpage>34</fpage>
          -
          <lpage>35</lpage>
          ,
          <year>1971</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. Y. Liu,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          .
          <article-title>Roberta: A robustly optimized BERT pretraining approach</article-title>
          . CoRR, abs/
          <year>1907</year>
          .11692,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>A. L. Maas</surname>
            ,
            <given-names>R. E.</given-names>
          </string-name>
          <string-name>
            <surname>Daly</surname>
            ,
            <given-names>P. T.</given-names>
          </string-name>
          <string-name>
            <surname>Pham</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>A. Y.</given-names>
          </string-name>
          <string-name>
            <surname>Ng</surname>
            , and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Potts</surname>
          </string-name>
          .
          <article-title>Learning word vectors for sentiment analysis</article-title>
          .
          <source>In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT '11</source>
          , pages
          <fpage>142</fpage>
          -
          <lpage>150</lpage>
          , Stroudsburg, PA, USA,
          <year>2011</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>R.</given-names>
            <surname>Navigli</surname>
          </string-name>
          and
          <string-name>
            <given-names>S. P.</given-names>
            <surname>Ponzetto. BabelNet:</surname>
          </string-name>
          <article-title>The automatic construction, evaluation and application of a wide-coverage multilingual semantic network</article-title>
          .
          <source>Artificial Intelligence</source>
          ,
          <volume>193</volume>
          :
          <fpage>217</fpage>
          -
          <lpage>250</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. L.
          <string-name>
            <surname>Real</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Fonseca</surname>
            , and
            <given-names>H. Gonçalo</given-names>
          </string-name>
          <string-name>
            <surname>Oliveira</surname>
          </string-name>
          .
          <article-title>The ASSIN 2 shared task: Evaluating Semantic Textual Similarity and Textual Entailment in Portuguese</article-title>
          .
          <source>In Proceedings of the ASSIN 2 Shared Task: Evaluating Semantic Textual Similarity and Textual Entailment in Portuguese, CEUR Workshop Proceedings</source>
          , page [In this volume].
          <source>CEUR-WS.org</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>C. E. Scarton. Verbnet.</surname>
          </string-name>
          <article-title>BR: construção semiautomática de um léxico verbal online e independente de domínio para o português do brasil</article-title>
          .
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>R.</given-names>
            <surname>Speer</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Havasi</surname>
          </string-name>
          .
          <article-title>Representing general relational knowledge in conceptnet 5</article-title>
          .
          <source>In Proceedings of the Eighth International Conference on Language Resources and Evaluation</source>
          , pages
          <fpage>3679</fpage>
          -
          <lpage>3686</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>F. M. Suchanek</surname>
            ,
            <given-names>G.</given-names>
            Kasneci, and G.
          </string-name>
          <string-name>
            <surname>Weikum. YAGO:</surname>
          </string-name>
          <article-title>a core of semantic knowledge</article-title>
          .
          <source>In Proceedings of the 16th International Conference on World Wide Web</source>
          , pages
          <fpage>697</fpage>
          -
          <lpage>706</lpage>
          , Banff,
          <string-name>
            <surname>AB</surname>
          </string-name>
          , Canada,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Voorhees</surname>
          </string-name>
          .
          <article-title>The trec question answering track</article-title>
          .
          <source>Nat. Lang</source>
          . Eng.,
          <volume>7</volume>
          (
          <issue>4</issue>
          ):
          <fpage>361</fpage>
          -
          <lpage>378</lpage>
          , Dec.
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21. A.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Michael</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Hill</surname>
            ,
            <given-names>O. Levy</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Bowman. GLUE:</surname>
          </string-name>
          <article-title>A multi-task benchmark and analysis platform for natural language understanding</article-title>
          .
          <source>CoRR</source>
          , abs/
          <year>1804</year>
          .07461,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>LeCun</surname>
          </string-name>
          .
          <article-title>Character-level convolutional networks for text classification</article-title>
          .
          <source>In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, NIPS'15</source>
          , pages
          <fpage>649</fpage>
          -
          <lpage>657</lpage>
          , Cambridge, MA, USA,
          <year>2015</year>
          . MIT Press.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>