<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ASAPPpy: a Python Framework for Portuguese STS?</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>CISUC, University of Coimbra</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>DEI, University of Coimbra</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>ISEC, Polytechnic Institute of Coimbra</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <fpage>14</fpage>
      <lpage>26</lpage>
      <abstract>
        <p>This paper describes ASAPPpy - a framework fully-developed in Python for computing Semantic Textual Similarity (STS) between Portuguese texts - and its participation in the ASSIN 2 shared task on this topic. ASAPPpy follows other versions of ASAPP. It uses a regression method for learning a STS function from annotated sentence pairs, considering a variety of lexical, syntactic, semantic and distributional features. Yet, unlike what was done in the past, ASAPPpy is a standalone framework with no need to use other projects in the feature extraction or learning phase. It may thus be extended and reused by the team. Despite being outperformed by deep learning approaches in ASSIN 2, ASAPPpy can explain the model learned by the relevant features that have been selected as well as inspect which type of features plays a key role in the STS learning.</p>
      </abstract>
      <kwd-group>
        <kwd>Semantic Textual Similarity</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Semantic Relations</kwd>
        <kwd>Word Embeddings</kwd>
        <kwd>Supervised Machine Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Semantic Textual Similarity (STS) aims at computing the proximity of meaning
of two fragments of text. Shared tasks on this topic have been organised in the
scope of SemEval 2012 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] to 2017 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], targeting English, Arabic and Spanish. In
2016, the ASSIN shared task [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] focused on STS for Portuguese, and its
collection was made available. ASSIN 2 was the second edition of this task, with minor
di↵ erences on the STS annotation guidelines and covering more simple text.
      </p>
      <p>
        ASAP(P) is the name of a collection of systems developed in CISUC for
computing STS based on a regression method and a set of lexical, syntactic,
semantic and distributional features extracted from text. It has participated
in several STS evaluations, for English and Portuguese, but was only recently
integrated in two single independent frameworks: ASAPPpy, in Python, and
ASAPPj, in Java. Both of the previous versions of ASAPP participated in ASSIN 2,
? This work was funded by FCT’s INCoDe 2030 initiative, in the scope of the
demonstration project AIA, “Apoio Inteligente a empreendedores (chatbots)”
but this paper is focused on the former, ASAPPpy. Also, although both ASSIN
and ASSIN 2 cover STS and Textual Entailment (TE), this paper is mainly
focused on the approach followed for STS, including feature engineering, feature
selection and learning methods. The performance of ASAPPpy in STS was
satisfactory for an approach that follows traditional supervised machine learning,
also enabling an analysis of the most relevant features, but it was clearly
outperformed by approaches based on deep learning or its products, including recent
transformer-based language models, like BERT [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>In the remainder of the paper, we overview previous work that led to the
development of ASAPPpy, focused on previous versions of this system. We then
describe the features exploited by ASAPPpy and report on the selection of the
regression method and features used, also covering the o cial results in ASSIN 2,
which we briefly discuss.
2</p>
    </sec>
    <sec id="sec-2">
      <title>An overview of ASAP(P) for STS</title>
      <p>
        The first version of ASAP [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] dates from 2014, with the participation in the
SemEval task Evaluation of Compositional Distributional Semantic Models on
Full Sentences through Semantic Relatedness and Textual Entailment [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], in
English, though only on the subtask of semantic relatedness. Here, a set of
65 features was extracted from sentence pairs ranging from overlapping token
counts, phrase chunks, or topic distributions.
      </p>
      <p>From the first participation, we proposed to learn a model based on
regression analysis that considered di↵ erent textual features, covering distinct aspects
of natural language processing. Lexical, syntactic, semantic and distributional
features were thus extracted from sentence pairs. The main di↵ erence in
successive versions is the increasing adoption of more and more distributional features,
initially based on topic modeling, and recently on di↵ erent word embedding
models. Its main contribution was in the use of complementary features for learning
a STS function, a part of the challenge of building Compositional Distributional
Semantic Models.</p>
      <p>
        One year later, ASAP-II [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] participated in a task that was closer to our
current goal: Semantic Textual Similarity (STS) at SemEval 2015 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Even though
the task covered three languages – English, Spanish and Arabic – , we only
targeted English. At first, the goal of STS may look similar to the one of
SemEval 2014’s task, but the available datasets were very di↵ erent from each other.
One such di↵ erence was the occurrence of named entities in the SemEval 2015
dataset. To address this, ASAP-II retrieved named entities and compound nouns
from DBPedia [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], an e↵ ort to extract structured information from Wikipedia.
Due to DBPedia’s central role in the Linked Data initiative, it is also connected
to WordNet [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], which enables the connection between some DBPedia entities
and their abstract category.
      </p>
      <p>
        Finally, one year later, motivated by the organisation of the first ASSIN
shared task [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], ASAP focused on Portuguese, becoming ASAPP – Automatic
Semantic Alignment for Phrases applied to Portuguese [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The first ASAPP
exploited several heuristics over Portuguese semantic networks [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] for extracting
semantic features, beyond lexical and syntactic. As the same nature of its
predecessors, several tools have been used for the extraction of morpho-syntactic
features, including tokenization, part-of-speech tagging, lemmatization, phrase
chunking, and named entity recognition. For the first ASSIN, this was achieved
with NLPPort [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], built on top of the OpenNLP framework, though with some
modifications targeting Portuguese processing.
      </p>
      <p>
        The original participation in ASSIN did not exploit distributional features.
Only later, word embeddings (word2vec CBOW [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]) and character n-grams
were adopted by ASAPP (version 2.0) [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. When trained in the ASSIN training
collections, adding distributional features to the others led to improvements in
the performance of STS. We also concluded that, although the ASSIN collections
were divided between European and Brazilian Portuguese, better results were
achieved when a single model was trained in both.
      </p>
      <p>Up until this point, all versions of ASAP(P) could not be seen as a single
wellintegrated solution. Di↵ erent features were extracted with di↵ erent tools, not
always applying the same pre-processing or even using the same programming
languages, and sometimes by di↵ erent people. After extraction, all features were
integrated in a single file, then used in the learning process. Towards better
cohesion and easier usability, in 2018, we started to work on the integration
of all feature extraction procedures in a single framework. Yet, due to specific
circumstances, we ended up developing two versions of ASAPP: ASAPPpy, fully
in Python, and ASAPPj, fully in Java. Each was developed by a di↵ erent person,
respectively Jos´e Santos and Eduardo Pais, both supervised by Ana Alves. This
paper is focused on ASAPPpy.</p>
      <p>
        Besides training and testing both versions of ASAPP in the collection of
the first ASSIN, their development coincided with ASSIN 2, where they both
participated. Curiously, the data of ASSIN 2 is closer to that of SemEval 2014’s
task [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], where the first ASAP participated.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Feature Engineering for Portuguese STS</title>
      <p>The main di↵ erence between ASAPPpy and previous versions of ASAPP is that
it is fully implemented in Python. This includes all pre-processing, feature
extraction, learning, optimization and testing steps.</p>
      <p>ASAPPpy follows a supervised learning approach. Towards the participation
in ASSIN 2, di↵ erent models were trained in the training collection of ASSIN 2,
and some also in the collections of the first ASSIN (hereafter ASSIN 1). Both
collections have the same XML-like format, where a similarity score (between 1
and 5) and an entailment label are assigned to each pair of sentences, based on
the opinion of several human judges. The first sentence of the pair is identified
by t and the second by h, which stands for text and hypothesis, respectively.</p>
      <p>The ASSIN 1 collection comprises 10,000 pairs, divided in two training
datasets, each with 3,000 pairs, and two testing datasets, each with 2,000,
covering the European-Portuguese (PTPT) and Brazilian-Portuguese (PTBR)
variants. The ASSIN 2 collection is divided into training and validation datasets,
with 6,500 and 500 pairs, respectively, and a testing dataset, with 3,000 pairs
whose similarity our model was developed to predict. In contrast to ASSIN 1,
the ASSIN 2 collection only covers the Brazilian-Portuguese (PTBR) variant.</p>
      <p>
        To compute the semantic similarity between the ASSIN sentence pairs, a
broad range of features was initially extracted, including lexical, syntactic,
semantic and distributional. All features were obtained using standard Python as
well as a set of external libraries, namely: NLTK [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], for getting the token and
character n-grams; NLPyPort4, a recent Python port of the NLPPort toolkit [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]
based on NLTK, for Part-of-Speech (PoS) tagging, Named Entity
Recognition (NER) and lemmatisation; Gensim [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], for removing non-alphanumeric
characters and multiple white spaces, and, in combination with scikit-learn [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ],
to extract the distributional features. Semantic features were based on a set of
Portuguese relational triples (see section 3.3) and distributional features relied
on a set of pre-trained Portuguese word embedding models (see section 3.4).
      </p>
      <p>Table 1 summarises all the features extracted, to be described in more detail
in the remainder of this section.</p>
      <p>Features
Common token 1/2/3-grams (Dice, Jaccard, Overlap coe cients)
Common character 2/3/4-grams (Dice, Jaccard, Overlap coe cients)
Di↵ erence between number of each PoS-tag (25 distinct tags)
Semantic relations between tokens in both sentences (4 types)
Di↵ erence between number of NEs of each category (10 categories)
Di↵ erence between number of NEs
TF-IDF vectors cosine
Average word-embeddings cosine (5 models)
Average TF-IDF weighted word-embeddings cosine (5 models)
Token n-grams binary vectors cosine
Character n-grams binary vectors cosine
Total
Lexical features compute the similarity between the sets and sequences of tokens
and characters used in both sentences of the pair. This is achieved with the
Jaccard, Overlap and Dice coe cients, each computed between the sets of token
n-grams, with n = 1, n = 2 and n = 3, and character n-grams, with n = 2,
n = 3 and n = 4, individually. In total, 18 lexical features were extracted given
that, for each n-gram, both token and character, we computed the three di↵ erent
coe cients. Figure 1 illustrates how sentences were split into n-grams, in this
4 https://github.com/jdportugal/NLPyPort
particular case, character 2-grams, and provides the value of the coe cients
computed over them, used as features.</p>
      <p>t: Uma pessoa tem cabelo loiro e esvoac¸ante e esta´ tocando viola˜o</p>
      <p>#
Character n-grams of size 2 in t: {Um, ma, pe, es, ss, so, oa, te, em, ca, ab,
be, el, lo, lo, oi, ir, ro, es, sv, vo, oa, ac¸, c¸a, an, nt, te, es, st, ta´, to, oc, ca, an, nd,
do, vi, io, ol, l˜a, ˜ao}
h: Um guitarrista tem cabelo loiro e esvoac¸ante</p>
      <p>#
Character n-grams of size 2 in h: {Um, gu, ui, it, ta, ar, rr, ri, is, st, ta, te,
em, ca, ab, be, el, lo, lo, oi, ir, ro, es, sv, vo, oa, ac¸, c¸a, an, nt, te}</p>
      <p>Jaccard(T, H) = |T \ H| = 20</p>
      <p>|T [ H| 42
Overlap(T, H) = |T \ H|
| min(T, H)|</p>
      <p>=
Dice(T, H) = |T \ H| = 20
|T | + |H| 62
= 0.4762</p>
      <p>Two variants of the previous lexical features were considered only for ASSIN
2. Their value is the cosine similarity between binary vectors obtained from
each sentence as follows: (i) extract the list of n-grams in sentence t, h or both,
considering di↵ erent values of n; (ii) represent each sentence as a vector where
each dimension corresponds to one of the extracted n-grams and is 1, if the
n-gram is in both t and h, or 0, if it is in only one. This was made for token
1/2/3-grams (first feature) and character 2/3/4-grams (second feature). Figure 2
illustrates the computation of this alternative character n-gram based feature to
the sentences used in the previous examples.
The only syntactic features exploited were based on the PoS tags assigned to
the tokens in each sentence of the pair, namely the absolute di↵ erence between
the number of occurrences of each PoS tag (25 distinct) in sentence t with
those in sentence h. Considering the sentences used in the previous example,
figure 3 shows the PoS tags for each word and the array of features obtained
after applying the aforementioned method. In these two sentences, only five
distinct tags were identified, which meant that for the remaining 20 the feature
has value zero.
t: Uma pessoa tem cabelo loiro e esvoac¸ante e esta´ tocando viola˜o</p>
      <p>#
Character 2/3/4-grams in t: {um, ma, pe, es, ss, so, oa, te, em, ca, ab, be, el,
lo, lo, oi, ir, ro, es, sv, vo, oa, a¸c, c¸a, an, nt, te, es, st, ta´, to, oc, ca, an, nd, do, vi,
io, ol, la˜, ˜ao, uma, pes, ess, sso, soa, tem, cab, abe, bel, elo, loi, oir, iro, esv, svo,
voa, oac¸, ac¸a, c¸an, ant, nte, est, sta´, toc, oca, can, and, ndo, vio, iol, ola˜, la˜o, pess,
esso, ssoa, cabe, abel, belo, loir, oiro, esvo, svoa, voac¸, oac¸a, ac¸an, ¸cant, ante, esta´,
toca, ocan, cand, ando, viol, iola˜, ola˜o}
h: Um guitarrista tem cabelo loiro e esvoac¸ante</p>
      <p>
        #
Character 2/3/4-grams in h: {um, gu, ui, it, ta, ar, rr, ri, is, st, ta, te, em, ca,
ab, be, el, lo, lo, oi, ir, ro, es, sv, vo, oa, ac¸, ¸ca, an, nt, te, gui, uit, ita, tar, arr, rri,
ris, ist, sta, tem, cab, abe, bel, elo, loi, oir, iro, esv, svo, voa, oa¸c, a¸ca, ¸can, ant,
nte, guit, uita, itar, tarr, arri, rris, rist, ista, cabe, abel, belo, loir, oiro, esvo, svoa,
voa¸c, oac¸a, ac¸an, c¸ant, ante}
Binary vector! t : [
        <xref ref-type="bibr" rid="ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1">0, 1, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1,
1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1,
1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1,
1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0</xref>
        ]
Binary vector! h : [
        <xref ref-type="bibr" rid="ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1 ref1">1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1,
1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 0,
1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1</xref>
        ]
Cosine!( t!, h ) = 0.5955
3.3
      </p>
      <sec id="sec-3-1">
        <title>Semantic Features</title>
        <p>
          Language is flexible in a way that the same idea can be expressed through di↵
erent words, generally related by well-known semantic relations, such as synonymy
or hypernymy. Such relations are implicitly mentioned in dictionaries and
explicitly encoded in wordnets and other lexical knowledge bases (LKBs). In order to
extract the semantic relations between words in each pair of sentences, a set of
triples, in the form word1 Semantic-Relation word2, was used. They were
acquired from ten lexical knowledge bases (LKBs) for Portuguese [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] and, for this
work, only those that occurred in at least three LKBs were considered. Based
on this, four features of this kind were computed, by counting the number of
semantic relations that existed between words in sentence t and words in sentence
h and then normalising the result. The following semantic relations were
considered: (i) synonymy; (ii) hypernymy/hyponymy; (iii) antonymy; (iv) any other
relation covered by the set of triples. Before searching for relations, words were
lemmatized with NLPyPort. Considering the sentences used in the two previous
examples, table 2 shows the array of features obtained with the aforementioned
method. In this case, there was a single relation, pessoa (person) hypernym-of
guitarrista (guitar player).
        </p>
        <p>Besides semantic relations, Named Entities (NE) were also exploited, due to
their importance for understanding the meaning of text. Although the collection
of ASSIN 2 would not include NEs, these features were still exploited, considering
the application of the model to other tasks. Computed features included the
absolute di↵ erence between the number of entities of each type identified in
sentence t and those in sentence h. As ten di↵ erent NE types were recognized (i.e,
Abstraction, Event, Thing, Place, Work, Organization, Person, Time, Value,
Other), this resulted in ten features, plus one for the absolute di↵ erence of the
total number of NEs between the sentences.
3.4</p>
      </sec>
      <sec id="sec-3-2">
        <title>Distributional Features</title>
        <p>Distributional features were based on the TF-IDF matrix of the corpus, which
allowed the representation of each sentence as a vector. The first feature of this
kind was the cosine of the TF-IDF vector of each sentence.</p>
        <p>
          In addition to the TF-IDF matrix, and given the importance of
distributional similarity models for computing semantic relatedness, four pre-trained
word embeddings for Portuguese, based on di↵ erent models and data, were also
exploited, namely: (i) NILC embeddings [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ], which o↵ er a wide variety of
pretrained embeddings, learned with di↵ erent models in a large Portuguese corpus.
From those, CBOW Word2vec and GloVe, both with 300-dimensioned vectors,
were selected; (ii) fastText.cc embeddings [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], which provide word vectors for
157 languages, trained on Common Crawl and Wikipedia using fastText. For the
present system, only the Portuguese word vectors were used; (iii) ConceptNet
Numberbatch [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ], obtained by applying a generalisation of the retrofitting
technique [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], which improves the representation of words in the form of vectors by
utilising the ConceptNet knowledge base. Given that the pre-trained vectors used
are multilingual, only the vectors of Portuguese words were used; (iv) PT-LKB
embeddings [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], a di↵ erent distributional model, not learned from corpora, but
built by applying the node2vec method to the same ten LKBs used for the
semantic features [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. The vectors used had 64 dimensions, the value that achieved
best results in word similarity tests [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>For each model, two di↵ erent features were considered, both after the
conversion of each sentence into a vector computed from the vectors of its tokens. The
di↵ erence is in how this sentence vector was created. For the first feature, it was
obtained from the average of the token vectors. For the second, it was computed
from the weighted average of the token vectors, weighted with the TF-IDF value
of each token. In all cases, the similarity of each pair of sentences was computed
with the cosine of their vectors.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Training a Portuguese STS model</title>
      <p>
        Based on the extracted features, various regression methods, with
implementation available in scikit-learn [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], were explored for learning a STS model. Since
the work on ASAPPpy started before the ASSIN 2 training collection was
released, initial experiments towards the selection of the regression method were
performed in the collection of ASSIN 1. It was also our goal to analyse whether
the number of features could be reduced and its impact on the results.
Experiments for this are reported in this section.
      </p>
      <p>The results submitted to ASSIN 2 were obtained with the selected method,
but also trained in the ASSIN 2 training collection, with features selected from
the results in the validation collection. In all experiments, performance was
assessed with the same metrics adopted in ASSIN and other STS tasks, namely
Pearson correlation (⇢ , between -1 and 1) and Mean Squared Error (MSE)
between the values computed and those in the collection.
4.1</p>
      <sec id="sec-4-1">
        <title>Selection of the Regression Method</title>
        <p>Towards the development of the STS models in ASAPPpy, models were trained
with di↵ erent regression methods, in both the PTPT and PTBR training
collections of ASSIN 1, and then tested individually on the testing collections of each
variant. After initial experiments, three methods were tested, namely: a Support
Vector Regressor (SVR), a Gradient Boosting Regressor (GBR) and a Random
Forest Regressor (RFR), all using scikit-learn’s default setup parameters.</p>
        <p>Having in mind the e ciency of the model, we further tried to reduce the
dimensionality of the feature set. For this purpose, we explored three types of
feature selection methods, also available in scikit-learn: Univariate, Model-based
and Iterative Feature Selection. In order to assess which method improved the
performance of the model the most, in comparison to each other and to using
all features, the model’s coe cient of determination R2 of the prediction was
used for each method. Although we did not perform any measurement of the
computational costs of these experiments, empirically we were able to assess
that both Univariate and Model-based methods were significantly faster than
Iterative Feature Selection when executed on the same machine.</p>
        <p>We should add that, for these experiments, only 67 of the 71 features
described in section 3 were considered. Four distributional features were only added
later, namely those using Numberbatch embeddings and the binary vectors based
on the presence of n-grams. With the aforementioned feature selection methods,
the initial set of 67 features was reduced to 12, with marginal improvements in
some cases, as the results in Tables 3 and 4 show. Although all selection methods
were tested, the applied selection is the result of an Iterative Feature Selection,
because it was the method leading to the highest performance. In the end, the
selected features were: the Jaccard, Dice and Overlap coe cients for token 1-grams
and character 3-grams; the Jaccard coe cient for character 2-grams; the cosine
similarity between the sentence vectors computed using the TF-IDF matrix; the
fastText.cc word embeddings; and the word2vec, fastText.cc and PTLKB word
embeddings weighted with the TF-IDF value of each token. This means that
the reduced model only uses lexical and distributional features. It does not use
syntactic nor semantic features, though semantic relations should be captured
by the distributional features, namely the word embeddings.</p>
        <p>Tables 3 and 4 report the performance of each model on both ASSIN 1 PTPT
and PTBR testing datasets, respectively before and after feature selection. The
best performing model is based on SVR and achieved a Pearson ⇢ of 0.72 and
MSE of 0.63, when tested on the PTPT dataset, using feature selection. For
PTBR, ⇢ was 0.71 and MSE 0.37, for the same model.</p>
        <sec id="sec-4-1-1">
          <title>Method ⇢PTMPTSE ⇢PTMBRSE</title>
          <p>SVR 0.66 0.71 0.67 0.42
GBR 0.71 0.67 0.70 0.39</p>
          <p>RFR 0.71 0.65 0.71 0.38
Table 3. Performance of di↵ erent
regression methods in ASSIN 1, before
feature selection.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>Method ⇢PTMPTSE ⇢PTMBRSE</title>
          <p>SVR 0.72 0.63 0.71 0.37
GBR 0.71 0.66 0.70 0.39</p>
          <p>RFR 0.72 0.64 0.71 0.38
Table 4. Performance of di↵ erent
regression methods in ASSIN 1, with
feature selection.
4.2</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>ASSIN 2 STS Model</title>
        <p>Although performed on the ASSIN 1 collection, experiments described in the
previous section support the selection of the Support Vector Regressor (SVR)
as the learning algorithm for the three runs submitted to ASSIN 2. All such
runs were trained considering the same features and algorithm parameterisation,
and were only di↵ erent in the composition of the training data, which was the
following:
– Run #1 used all available data for Portuguese STS: ASSIN 1 PTPT/PTBR
train and test datasets + ASSIN 2 train and validation datasets, comprising
a total of ⇡ 17,000 sentence pairs.
– Run #2 considered that the ASSIN 2 data would be exclusively in Brazilian
Portuguese, so did not use the ASSIN 1 PTPT data, comprising a total of
⇡ 12,000 sentence pairs.
– Run #3 had in mind that ASSIN 1 data could be di↵ erent enough from
ASSIN 2, thus not useful in this case, so used only the ASSIN 2 training and
validation data, comprising a total of ⇡ 7,000 sentence pairs.</p>
        <p>
          Despite originally exploiting the full set of 71 features, all submitted runs were
based on a reduced featured set. Features were selected based on the Pearson ⇢
of a model trained in all available data except the ASSIN 2 validation pairs,
and validated in the latter pairs. In the end, models considered only 27 features,
which were the 40% most relevant according to Univariate Statistics for di↵ erent
percentiles. In this case, this was the feature selection method that lead to the
best performance. These are the 27 features e↵ ectively considered:
– Jaccard, Overlap and Dice coe cients, each computed between the sets of
token 1/2/3-grams and character 2/3/4-grams.
– Averaged token vectors, computed with the following word embeddings:
word2vec-cbow, GloVe (300-sized, from NILC [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]), fastText.cc [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ],
Numberbatch [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] and PT-LKB [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
– TF-IDF-weighted averaged token vectors, computed with the following
word embeddings: word2vec-cbow, GloVe (300-sized, from NILC [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]),
fastText.cc [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] and Numberbatch [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ].
        </p>
        <p>Table 5 shows the o cial results of each run in the ASSIN 2 test collection.
The best performance was achieved by run #3, with ⇢ = 0.74 and M SE = 0.60,
despite the fact that this was the model that used the least amount of training
data. Having no improvements with ASSIN 1 data is an indication of the (known)
di↵ erences between the ASSIN 1 and ASSIN 2 collections. Such di↵ erences may
explain the performance obtained in run #3, in which the data used for training,
being exclusively from ASSIN 2, resulted in a model that could fit better the
testing data. In opposition to the di↵ erences in the Pearson ⇢ , MSE was similar
for every run, but slightly higher precisely for run #3.</p>
        <p>After the evaluation, we repeated this experiment using the full set of 71
features, to conclude that using all features is not a good option. Pearson ⇢
achieved this way are equally poor and were 0.65, 0.66 and 0.66, respectively for
Runs #1 #2 #3</p>
        <p>⇢ 0.726 0.730 0.740</p>
        <p>MSE 0.58 0.58 0.60</p>
        <p>Table 5. O cial results of ASAPPpy in ASSIN 2 STS.
the configuration of the runs #1, #2 and #3. A curious result is that the MSE
was significantly higher for run #3 configuration (0.85), the one trained only on
the ASSIN 2 training data, when compared to the others (0.65 and 0.71). In the
o cial results, run #3 had also the highest MSE, but only by a small margin.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Textual Entailment</title>
        <p>Although it was not the primary focus of ASAPPpy, we tried to learn a classifier
for textual entailment using the same features extracted for STS. Three models
were trained, respectively with the features used in each run, with the
configurations shown in the table 6. Yet, unlike the STS training phase, we chose to
use the entire ASSIN 1 and training part of ASSIN 2 collection for the first two
runs (⇡ 17,000), selecting the best one (according to 10-fold cross-validation) to
train a third model only on the training part of the ASSIN 2 dataset (⇡ 7,000).
Regarding the ASSIN 1 dataset, where there were three classes: Entailment,
None and Paraphrase, this third class was considered as Entailment, in order to
standardize the two datasets, since ASSIN 2 contains only the first 2 classes.</p>
        <p>The performance of ASAPPpy in this task, below both baselines, is clearly
poor. However, this was not the main goal of our participation. If more e↵ ort
was dedicated to this task, we would probably analyse the most relevant features
specifically for entailment, and possibly train new models from this knowledge.
We described the participation of ASAPPpy in ASSIN 2 and explained some
decisions that lead to using SVR-based models trained with a reduced set of
lexical and distributional features. The main di↵ erence between the three submitted
runs is the training data and the best performance (⇢ = 0.74 and M SE = 0.60)
was achieved by the model trained only in ASSIN 2 data. Using ASSIN 1 data
lead to no improvements, which supports the di↵ erences between the two
collections. For instance, ASSIN 2 does not include complex linguistic phenomena nor
named entities, which is not the case of ASSIN 1. But this does not necessarily
mean that ASSIN 2 is easier, which is also suggested by the performance of our
models, only slightly better in ASSIN 2.</p>
        <p>We see the results achieved as satisfactory, at least for an approach based on
traditional machine learning. Yet, they are clearly outperformed by approaches
of other teams relying in deep learning or its products. On the other hand, our
results can be interpreted, not only during the extraction of each feature, but also
by applying feature selection during the training phase. For instance, features
exploiting word embeddings and distance metrics between sentences shown to
be the most relevant when computing the STS between phrases in Portuguese.</p>
        <p>
          The curent version of ASAPPpy and its source code is available from
https://github.com/ZPedroP/ASAPPpy. Still, in the future we would like to
experiment with contextual word embeddings [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] given their recent positive
performance in a set of di↵ erent Natural Language Processing tasks. Pre-trained
embeddings of that kind may be further fine-tuned on the ASSIN data, and be
used alone as the representation of each sentence, or as additional features.
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agirre</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Banea</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cardie</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez-Agirre</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopez-Gazpio</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maritxalar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mihalcea</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rigau</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uria</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wiebe</surname>
          </string-name>
          , J.: SemEval
          <article-title>-2015 task 2: Semantic textual similarity, English, Spanish and pilot on interpretability</article-title>
          .
          <source>In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval</source>
          <year>2015</year>
          ). pp.
          <fpage>252</fpage>
          -
          <lpage>263</lpage>
          . Association for Computational Linguistics, Denver, Colorado (Jun
          <year>2015</year>
          ), https://www.aclweb.org/anthology/S15-2045
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Agirre</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez-Agirre</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Semeval-2012 task 6: A pilot on semantic textual similarity</article-title>
          .
          <source>In: Proc. 1st Joint Conf. on Lexical and Computational Semantics-Vol. 1: Proc. of main conference and shared task, and Vol. 2: Proc. of 6th Intl. Workshop on Semantic Evaluation</source>
          . pp.
          <fpage>385</fpage>
          -
          <lpage>393</lpage>
          . Association for Computational Linguistics (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Alves</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferrugento</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Rodrigues</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          :
          <article-title>ASAP: Automatic semantic alignment for phrases</article-title>
          .
          <source>In: SemEval Workshop</source>
          , COLING 2014,
          <article-title>Ireland</article-title>
          . n/a (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Alves</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodrigues</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Gonc¸alo Oliveira, H.:
          <article-title>ASAPP: Alinhamento semˆantico automa´tico de palavras aplicado ao portuguˆes</article-title>
          .
          <source>Linguama´tica 8(2)</source>
          ,
          <fpage>43</fpage>
          -
          <lpage>58</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Alves</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Simo˜es,
          <string-name>
            <given-names>D.</given-names>
            , Gonc¸alo Oliveira, H.,
            <surname>Ferrugento</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>ASAP-II: From the alignment of phrases to textual similarity</article-title>
          .
          <source>In: 9th International Workshop on Semantic Evaluation (SemEval</source>
          <year>2015</year>
          ).
          <article-title>n/a (</article-title>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Alves</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Gonc¸alo Oliveira, H.,
          <string-name>
            <surname>Rodrigues</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <source>Encarnac¸a˜o, R.: ASAPP 2</source>
          .
          <article-title>0: Advancing the state-of-the-art of semantic textual similarity for Portuguese</article-title>
          .
          <source>In: Proceedings of 7th Symposium on Languages, Applications and Technologies (SLATE</source>
          <year>2018</year>
          ).
          <source>OASIcs</source>
          , vol.
          <volume>62</volume>
          , pp.
          <volume>12</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          :
          <fpage>17</fpage>
          .
          <string-name>
            <surname>Schloss</surname>
          </string-name>
          Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (
          <year>June 2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kobilarov</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cyganiak</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ives</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Dbpedia: A nucleus for a web of open data</article-title>
          . In: Aberer,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.S.</given-names>
            ,
            <surname>Noy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Allemang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.I.</given-names>
            ,
            <surname>Nixon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Golbeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Mika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Maynard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Mizoguchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Schreiber</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          ,
          <article-title>Cudr´e-</article-title>
          <string-name>
            <surname>Mauroux</surname>
          </string-name>
          , P. (eds.)
          <article-title>The Semantic Web</article-title>
          . pp.
          <fpage>722</fpage>
          -
          <lpage>735</lpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Bird</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Loper</surname>
            ,
            <given-names>E.: Natural</given-names>
          </string-name>
          <string-name>
            <surname>Language Processing with Python. O'Reilly Media</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Enriching word vectors with subword information</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>5</volume>
          ,
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Cer</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Diab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Agirre</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopez-Gazpio</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Specia</surname>
          </string-name>
          , L.:
          <article-title>SemEval-2017 task 1: Semantic Textual Similarity multilingual and crosslingual focused evaluation</article-title>
          .
          <source>In: Procs. of 11th Intl. Workshop on Semantic Evaluation (SemEval-2017)</source>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          . Association for Computational Linguistics (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In: Proc 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers). pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . Association for Computational Linguistics, Minneapolis,
          <source>Minnesota (Jun</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Faruqui</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dodge</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jauhar</surname>
            ,
            <given-names>S.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hovy</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N.A.</given-names>
          </string-name>
          :
          <article-title>Retrofitting word vectors to semantic lexicons</article-title>
          .
          <source>In: Proceedings of NAACL</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Fellbaum</surname>
          </string-name>
          , C. (ed.):
          <article-title>WordNet: An Electronic Lexical Database (Language, Speech,</article-title>
          and Communication). The MIT Press (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Fonseca</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Criscuolo</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Alu´ısio, S.:
          <article-title>Vis˜ao geral da avaliac¸a˜o de similaridade semaˆntica e inferˆencia textual</article-title>
          .
          <source>Linguama´tica 8(2)</source>
          ,
          <fpage>3</fpage>
          -
          <lpage>13</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. Gon¸calo Oliveira, H.:
          <article-title>Learning word embeddings from portuguese lexical-semantic knowledge bases</article-title>
          .
          <source>In: Computational Processing of the Portuguese Language - 13th International Conference, PROPOR</source>
          <year>2018</year>
          , Canela, Brazil,
          <source>September 24-26</source>
          ,
          <year>2018</year>
          , Proceedings. LNCS, vol.
          <volume>11122</volume>
          , pp.
          <fpage>265</fpage>
          -
          <lpage>271</lpage>
          . Springer (
          <year>September 2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. Gon¸calo Oliveira, H.:
          <article-title>A survey on Portuguese lexical knowledge bases: Contents, comparison and combination</article-title>
          .
          <source>Information</source>
          <volume>9</volume>
          (
          <issue>2</issue>
          ) (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Hartmann</surname>
            ,
            <given-names>N.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fonseca</surname>
            ,
            <given-names>E.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shulby</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Treviso</surname>
            ,
            <given-names>M.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodrigues</surname>
            ,
            <given-names>J.S.</given-names>
          </string-name>
          , Alu´ısio, S.M.:
          <article-title>Portuguese word embeddings: Evaluating on word analogies and natural language tasks</article-title>
          .
          <source>In: Proc 11th Brazilian Symposium in Information and Human Language Technology. STIL</source>
          <year>2017</year>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Marelli</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bentivogli</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baroni</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernardi</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Menini</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zamparelli</surname>
          </string-name>
          , R.:
          <article-title>Semeval-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment</article-title>
          .
          <source>In: Proceedings of 8th International Workshop on Semantic Evaluation (SemEval</source>
          <year>2014</year>
          ). pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . Association for Computational Linguistics, Dublin, Ireland (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Pedregosa</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Varoquaux</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gramfort</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirion</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grisel</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubourg</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanderplas</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passos</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cournapeau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brucher</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perrot</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duchesnay</surname>
          </string-name>
          , E.:
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          ,
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20. Rˇeh˚uˇrek,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Sojka</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Software Framework for Topic Modelling with Large Corpora</article-title>
          .
          <source>In: Proc LREC 2010 Workshop on New Challenges for NLP Frameworks</source>
          . pp.
          <fpage>45</fpage>
          -
          <lpage>50</lpage>
          . ELRA, Valletta, Malta (May
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Rodrigues</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Gonc¸alo Oliveira, H.,
          <string-name>
            <surname>Gomes</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>NLPPort: A Pipeline for Portuguese NLP</article-title>
          .
          <source>In: Proceedings of 7th Symposium on Languages, Applications and Technologies (SLATE</source>
          <year>2018</year>
          ).
          <source>OASIcs</source>
          , vol.
          <volume>62</volume>
          , pp.
          <volume>18</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          :
          <fpage>9</fpage>
          .
          <string-name>
            <surname>Schloss</surname>
          </string-name>
          DagstuhlLeibniz-Zentrum fuer Informatik, Dagstuhl, Germany (
          <year>June 2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Speer</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Havasi</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Conceptnet 5.5: An open multilingual graph of general knowledge</article-title>
          .
          <source>In: Pro 31st AAAI Conference on Artificial Intelligence</source>
          . pp.
          <fpage>4444</fpage>
          -
          <lpage>4451</lpage>
          . San Francisco, California, USA (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>