<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>NLP&amp;IR@UNED at CheckThat! 2020: A Preliminary Approach for Check-Worthiness and Claim Retrieval Tasks using Neural Networks and Graphs ?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Juan R. Martinez-Rico</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lourdes Araujo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juan Martinez-Romo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Instituto Mixto de Investigacion - Escuela Nacional de Sanidad</institution>
          ,
          <addr-line>IMIENS</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>NLP &amp; IR Group, Dpto. Lenguajes y Sistemas Informaticos Universidad Nacional de Educacion a Distancia (UNED)</institution>
          ,
          <addr-line>Madrid 28040</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Check-Worthiness and Claim Retrieval are two of the rst tasks to be performed on the Fake News detection pipeline. In this article we present our approach to these tasks presented in the 2020 edition of the CheckThat! Lab. In the task 1, Tweet Check-Worthiness English, we propose a Bi-LSTM model with Glove Twitter embeddings where the number of inputs has been increased with a graph generated from the additional information provided for each tweet. In task 1 Arabic we have followed a similar approach but using a feed forward neural network model with Arabic embeddings. For the task 5, Debate CheckWorthiness, we propose a naive Bi-LSTM model with Glove embeddings. Finally, our approach to the task 2, Claim Retrieval, is based in a feed forward neural network model with features such as cosine similarity over Universal Sentence Encoder embeddings of tweets and claims, and other linguistic features extracted from both elements.</p>
      </abstract>
      <kwd-group>
        <kwd>check-worthiness • claim retrieval • embeddings • graph fea- tures</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>One of the problems that current society faces is the spreading of fake news.
The in uence of this type of news in the results of the United States presidential
elections a few years ago, and more recently, the disinformation that they are
Copyright ' 2020 for this paper by its authors. Use permitted under Creative
Commons License Attribution 4.0 International (CC BY 4.0). CLEF 2020, 22-25
September 2020, Thessaloniki, Greece.
? This work has been partially supported by the Spanish Ministry of Science and
Innovation within the projects PROSA-MED (TIN2016-77820-C3-2-R),
DOTTHEALTH (PID2019-106942RB-C32) and EXTRAE II (IMIENS 2019).
causing in the current pandemic of COVID2019 are two examples of this
phenomenon. To combat this problem, sites dedicated to checking the veracity of
the news that circulate daily in traditional media and on social networks have
proliferated. These sites normally carry out their work through human experts
who review the news in circulation every day. To facilitate the work of these
experts, some systems that prioritize the claims to be veri ed or return a list
of claims that verify another given as input have been proposed. On the other
hand, these two tasks can also be part of a larger system of claim veri cation
and detection of fake news.</p>
      <p>
        This article describes the participation of the NLP&amp;IR@UNED3 team in
tasks T1, T2 and T5 of the CheckThat! Lab at CLEF2020[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Tasks T1 and
T5 are dedicated to prioritizing claims to be veri ed (check-worthiness) and task
T2 is a claim retrieval task.
      </p>
      <p>The rest of the article is organized as follows: in section 2 we describe the
di erent approaches that we have tried to face each task, in section 3 we discuss
the results we have obtained, and section 4 is devoted to re ect our conclusions
and future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Proposed Approaches</title>
      <p>2.1</p>
      <sec id="sec-2-1">
        <title>Task 1 - Tweet Check-Worthiness in English and Arabic</title>
        <p>The purpose of this task is, given a stream of tweets and one or more topics, sort
the tweets according to their check-worthiness for the topic to which they are
supposed to belong. Three topics have been provided for the Arabic language:
\The protests in Lebanon", \Wassim Youssef" and \Turkey enters Syria", and
all English language tweets belong to topic \COVID19". The o cial evaluation
measure for Arabic language is P@30 and for English language is MAP.</p>
        <p>To address these two versions of the task 1, ve di erent models have been
analyzed. All models use cross entropy as a loss function and accuracy as a
metric4.</p>
        <p>Model 1 The rst of these models5 is a feed forward neural network (FFNN)
whose input is made up of word embeddings of the rst n words of the tweet
text, followed by a 1D global max pooling layer that operates on the n word
vectors, a hidden layer, and a sigmoid nal layer of size 1 that provides the
check-worthiness score between 0 and 1.</p>
        <sec id="sec-2-1-1">
          <title>3 Identi ed as NLPIR01 in the o cial results.</title>
        </sec>
        <sec id="sec-2-1-2">
          <title>4 We plan to release the source code at https://github.com/jrmtnez/NLP-IR-UNED</title>
          <p>at-CheckThat-2020.</p>
        </sec>
        <sec id="sec-2-1-3">
          <title>5 All models have been implemented in Tensor ow 2.1 with Keras:</title>
          <p>
            https://www.tensor ow.org
Embedding features Word embeddings can be self-generated during training,
or they can be preloaded at startup. For this second option, English pretrained
Twitter Glove[
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] embeddings of dimension 2006 and Glove Arabic embeddings
of dimension 2567 have been used respectively in each version of task 1.
          </p>
          <p>Additionally, in the Arabic version of task 1, the title and the description of
the topic to which each tweet belongs have been concatenated to the text of the
tweet to form the model input.</p>
          <p>
            To preprocess the raw Arabic text, we use as tokenizer the simple word tokenize
from Camel Tools[
            <xref ref-type="bibr" rid="ref4">4</xref>
            ]. The rst 25 tokens of each tweet and the rst 100 tokens
of the topic title and topic description concatenation have been selected to form
an input of size 125.
          </p>
          <p>
            For the English language, the NLTK8[
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] tokenizer has been used, and the
rst 50 tokens have been selected as input.
          </p>
          <p>Graph features Taking advantage of the fact that the organizers have provided
the complete information of each tweet in a json le, we have implemented a
process to increase the size of the input with tweets related to the current tweet. To
do this, from the information of the tweets contained in the training, dev and test
datasets, we extract triples of type tweet-hashtags-hashtag, tweet-quoted-tweet,
tweet-reply status-tweet, tweet-contain url-url and tweet-has mention-user, and
build a graph per dataset with these triples.</p>
          <p>After this, for each tweet present in the datasets, we search for the rst three
tweets that are neighbors of the current one, for example, we would go from the
tweet node to a hashtag node and from this we would select three tweet nodes.
If there were no hashtag neighbor nodes or the hashtag neighbor nodes did not
have three or more related tweet nodes, from the initial tweet node we search for
tweet nodes behind relations quoted, reply status, contain url and has mention in
this order. The result is that the texts of up to three tweets can be concatenated
to the text of the original tweet.</p>
          <p>As we will see later in section 3.1, the model can be executed indistinctly
with the inputs from a single instance, or with the inputs concatenated from
several instances if we make use of the tweets graph.</p>
          <p>Model 2 The second model is a CNN fed by the same features as Model 1.
The input and embedding layers are followed by two pairs of convolutional 1D
and max pooling 1D layers, a atten layer that feeds a dense layer, and nally
a dense sigmoid layer of size 1 at the output. Each convolutional 1D layer has a
kernel size of 5 and the max pooling 1D layers have a pool size of 5.
Model 3 The third model is a LSTM network fed by the same features as Model
1. In this case, after the input and embedding layers there is a LSTM layer that
feds directly the dense sigmoid layer of size 1 that forms the output.</p>
        </sec>
        <sec id="sec-2-1-4">
          <title>6 https://nlp.stanford.edu/projects/glove/</title>
        </sec>
        <sec id="sec-2-1-5">
          <title>7 https://github.com/tarekeldeeb/GloVe-Arabic</title>
        </sec>
        <sec id="sec-2-1-6">
          <title>8 https://www.nltk.org/</title>
          <p>Model 4 The fourth model is a Bi-LSTM network fed by the same features that
previous models. After the input and embedding layers there are two
bidirectional LSTM layers followed by a dense layer that precedes the output sigmoid
layer.</p>
          <p>Model 5 The last model is again a FFNN but in this case the input is made
up of tf-idf vectors. To build this input we have followed the same strategy that
in the embedding and graph features of previous models: in Arabic language the
title and the description of the topic have been concatenated to the twitter text,
and up to three additional tweets have been added to the input from the graph
extracted from the tweet information.</p>
          <p>The network is made up of three pairs of dense and batch normalization
layers and the size of the dense layers decreases by 50% in each stage. These six
layers are followed by a batch normalization layer, a dropout layer, and nally
a sigmoid output layer.
2.2</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Task 5 - Debate Check-Worthiness in English</title>
        <p>The objective of this task is, given a transcripted political debate segmented into
sentences and with the speakers annotated, sort the sentences according to their
check-worthiness. The o cial evaluation measure for this task is MAP.</p>
        <p>In this task, the ve models described in section 2.1 have been used with slight
variations. First, FFNN models have also been run without a hidden layer. On
the other hand, in addition to word embeddings and tf-idf vectors, another type
of input data has been used in this model as we explain below.</p>
        <p>
          To perform a text analysis of the sentences we have prepared a version of
the English Regressive Imagery Dictionary (RID)[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ][
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] with a format compatible
with that used by liwc python module9. This dictionary contains 3150 words and
roots in 48 categories, and these in turn are grouped into three main categories:
primary, secondary and emotion.
        </p>
        <p>The input vector consists of 51 decimal numbers, one for each category. For
each instance of the training and test datasets, we calculate the percentage of
words that are in those 51 categories.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Task 2 - Claim Retrieval in English</title>
        <p>In this task, for each check-worthy tweet provided, a ranked list of claims must be
returned, re ecting which claims best support that tweet. The o cial evaluation
measure for this task is MAP@5.</p>
        <p>
          In our approach we have used an FFNN similar to model 5 described above,
and for this model we build a dataset in the following way: for each claim we
calculate its sentence embedding vc with Universal Sentence Encoder[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], the
ratio of di erent tokens to total tokens rtc, the average number of characters
        </p>
        <sec id="sec-2-3-1">
          <title>9 https://pypi.org/project/liwc/</title>
          <p>per word avcc, the number of verbs vnc, the number of nouns nnc, the ratio
of content words10 rcwc regarding the total number of words, and the ratio of
content tags11 rctc with respect to the total of tags.</p>
          <p>The same values vt, rtt, avct, vnt, nnt, rcwt, rctt are calculated for each tweet,
and for each claim title vct, rtct, avcct, vnct, nnct, rcwct, rctct.</p>
          <p>Combining claims and tweets and claim titles and tweets we obtain the
features for our dataset shown in table 1.</p>
          <p>Claim - Tweet</p>
          <p>Claim title - Tweet
simct = cosine sim(vc, vt)</p>
          <p>simctt = cosine sim(vct, vt)
drtct = rtc-rtt
davcct = avcc-avct
dvnct = vnc-vnt
dnnct = nnc-nnt
drcwct = rcwc-rcwt
drctct = rctc-rctt
dctctt = rtct-rtt
davcctt = avcct-avct
dvnctt = vnct-vnt
dnnctt = nnct-nnt
drctctt = rctct-rctt
drcwctt = rcwct-rcwt</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments and Results</title>
      <sec id="sec-3-1">
        <title>Task 1 - Tweet Check-Worthiness in English and Arabic</title>
        <p>In task 1, for both, English and Arabic languages, we have increased the size of
the input by relying on the creation of a graph with which we retrieve tweets
related to each instance of the datasets based on the hypothesis that the
relationships tweet-hashtags-hashtag, tweet-quoted-tweet, tweet-reply status-tweet,
tweetcontain url-url and tweet-has mention-user can enrich the information provided
to the di erent classi ers. To verify this, we have performed a grid search with
di erent parameters that were applied or not according to the model used. These
parameters have been: graph generated inputs, pretrained embeddings, number
of epochs, batch size, hidden layer size, activation type, optimizer type, dropout,
number of epochs and batch size. All models use the adam optimizer with its
default parameter values except the model 4 which uses nadam.</p>
        <p>In Arabic there was no dev dataset so we extracted 20% of the training
dataset instances as a dev dataset.
10 Nouns, verbs, adjectives and adverbs.
11 \NN", \NNS", \NNP", \NNPS", \VB", \VBD", \VBG", \VBN", \VBP", \VBZ",
\JJ", \JJR", \JJS", \RB", \RBR", \RBS" and \WRB"</p>
        <p>After observing the results obtained for the Arabic language, we can see that
in almost all cases the use of Glove Arabic embeddings improves the result, and
that the expansion of inputs through the tweet graph improves ve of the nine
possible con gurations, one of them (FFNN + Graphs + Embedings) being the
one that obtains the best global result.</p>
        <p>For the English language again it is con rmed that the use of Glove
embeddings improve performance in all cases. The graph features do not show a
homogeneous behavior, although the best value is obtained for the Bi-LSTM
model with embeddings and graph features.</p>
        <p>Regarding the o cial results, in Arabic language our best run was the
BiLSTM model with Glove Arabic embeddings and graph features, sent as
contrastive2 which obtained a P@30 of 0.5333, ranking 13th out of 28 runs sent by
all the teams, while the FFNN (primary) and CNN (contrastive1 ) models with
the same features obtained a P@30 of 0.3917 (ranking 19th) and 0.4833 (ranking
16th) respectively. In English language our best run was the Bi-LSTM model
(primary) which obtained a MAP of 0.6069 (ranking 20th) and the FFNN
(conModel 2
(CNN)
trastive1 ) and CNN (contrastive2 ) models obtained a MAP of 0.5546 (ranking
22th) and 0.5193 (ranking 23th), respectively.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Task 5 - Debate Check-Worthiness in English</title>
        <p>In task 5, di erent parameters and settings have also been analyzed. In this case,
a dev dataset was not available either, so we have partitioned the training dataset
leaving 20% of it as a dev dataset. Given the great imbalance of classes (4027
negative instances and 42 positive instances) we have opted for an oversampling
strategy to equalize the number of positive and negative instances. In this task,
an additional FFNN model 6 that makes use of features derived from a RID text
analysis has been used.</p>
        <p>Table 4 shows the best results obtained for the di erent combinations of the
use of pretrained embeddings and oversampling, and the optimal parameters
for each model. All models use the adam optimizer with its default parameter
values.</p>
        <p>As we can see, the use of oversampling in general does not improve the
behavior of the di erent models. The best results in FFNN models are obtained
with the use of RID-based features without oversampling, outperforming FFNN
models with inputs based on embeddings and TF/IDF vectors. It is also clearly
seen, as was the case in task 1, that the use of 6B-100D Glove pretrained
embeddings substantially improve the performance of all the models in which it can be
used.</p>
        <p>The model that outperforms the rest by far is the Bi-LSTM with Glove
embeddings. All of the runs that we submitted for task 5 were based on this
model. The contrastive2 run used oversampling, while the primary and
contrastive1 runs did not use oversampling, and shared the same parameters. The
only di erence between them was a di erent random weight initialization.
In this task we have used an FFNN with 1000 elu12 units in the rst hidden
layer, 500 elu units in the second hidden layer and 250 elu units in the last
hidden layer, and Universal Sentence Encoder embeddings of tweets, claims and
claim titles. We have used the adam optimizer with its default values and we
have trained the model for 50 epochs with a batch size of 128.</p>
        <p>In our experiments on the dev dataset we were able to verify that the use of
the features derived from the claim title do not provide improvements in the
performance o ered by the claim-tweet features. Table 6 shows the results obtained
with both con gurations. Both sets of features exceed the MAP@5 of 0.609
obtained by the baseline based on Elasticsearch provided by the organization.</p>
        <p>We have submitted three runs with two di erent settings. In the rst one,
sent as primary, we have made use of the seven features described in section 2.3
involving claims and tweets. With this con guration we obtained a MAP@5 of
0.8560, placing our team in fourth place.</p>
        <p>In the contrastive1 run we used all features and the contrastive2 was
identical to the primary run but with a di erent random initialization. With these
two con gurations we obtained respectively a MAP@5 of 0.8390 and 0.8550,
conrming that the seven Claim title - Tweet features do not improve performance
when used together with the seven Claim - Tweet features.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and Future Work</title>
      <p>In this paper, we present our approximation to the tasks tweet check-worthiness,
debate check-worthiness and claim retrieval at the CLEF-2020 Check-That! Lab.</p>
      <p>Examining the results obtained in the two check-worthiness tasks to which
we have participated, three if we take into account that the rst was de ned in
two di erent languages, we can see that the use of pretrained embeddings of the
appropriate language, signi cantly improves the performance of the models with
respect to generating these embeddings during the training phase.</p>
      <p>In the two versions of task 1, we have used a particular method to increase
the information that reaches the input of the models, using a graph constructed
from the tweets provided in the datasets that collects the relationships
tweethashtags-hashtag, tweet-quoted-tweet, tweet-reply status-tweet, tweet-contain
urlurl and tweet-has mention-user, between tweets. Although this mechanism has
not behaved in a homogeneous way throughout the di erent models, it has been
the one that has obtained the highest MAP values in the dev dataset in both
the Arabic and English versions of this task.</p>
      <p>
        In task 5 we have made use of RID-based features and, although we have not
sent any run with them, in our tests with FFNN models on the dev dataset, we
have obtained good results compared to embeddings and tf-idf based features,
so this type of text analysis-based features can be an alternative to more
wellknown ones like LIWC[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. We did not have these features prepared in time for
task 1 but we think they can be applied to tweet texts and it will be a job to be
done in the future.
      </p>
      <p>MAP values in this task are certainly low. We think that the large class
imbalance and the small number of positive instances can cause these instances
to not be correctly characterized by the models.</p>
      <p>In the claim retrieval task we have assumed that the similarity between claim,
claim title and tweet would allow a selection of claims appropriate to the
requirements of this task and for this, we have implemented a mixed strategy using
sentence embeddings and stylometric features based on token counts and ratios,
far exceeding the provided baseline with these features.</p>
      <p>On the other hand, the balance of our participation in these tasks has been
positive, obtaining rst place in task 5, fourth in task 2 and more discreet results
in task 1.</p>
      <p>
        In future work we plan to continue experimenting with graph features to
increase the size of the inputs or the number of training instances, combining this
strategy with more sophisticated language representations such as BERT[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Alberto</given-names>
            <surname>Barron-Cedeno</surname>
          </string-name>
          , Tamer Elsayed, Preslav Nakov, Giovanni Da San Martino, Maram Hasanain, Reem Suwaileh, Fatima Haouari, Nikolay Babulkov, Bayan Hamdan, Alex Nikolov, Shaden Shaar, and Zien Sheikh Ali. Overview of CheckThat! 2020:
          <article-title>Automatic Identi cation and Veri cation of Claims in Social Media</article-title>
          . arXiv:
          <year>2007</year>
          .07997 [cs],
          <year>July 2020</year>
          . arXiv:
          <year>2007</year>
          .07997.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Alberto</surname>
          </string-name>
          Barron-Ceden~o, Tamer Elsayed, Preslav Nakov, Giovanni Da San Martino, Maram Hasanain, Reem Suwaileh, and Fatima Haouari. CheckThat! at CLEF 2020:
          <article-title>Enabling the Automatic Identi cation and Veri cation of Claims in Social Media</article-title>
          . In Joemon M. Jose, Emine Yilmaz, Joa~o Magalha~es, Pablo Castells, Nicola Ferro,
          <string-name>
            <given-names>Mario J.</given-names>
            <surname>Silva</surname>
          </string-name>
          , and Flavio Martins, editors,
          <source>Advances in Information Retrieval, Lecture Notes in Computer Science</source>
          , pages
          <volume>499</volume>
          {
          <fpage>507</fpage>
          ,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          ,
          <year>2020</year>
          . Springer International Publishing.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Je rey Pennington, Richard Socher, and
          <string-name>
            <given-names>Christopher D.</given-names>
            <surname>Manning</surname>
          </string-name>
          . Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          , pages
          <fpage>1532</fpage>
          {
          <fpage>1543</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Ossama</given-names>
            <surname>Obeid</surname>
          </string-name>
          , Nasser Zalmout, Salam Khalifa, Dima Taji, Mai Oudah, Bashar Alhafni, Go Inoue, Fadhl Eryani, Alexander Erdmann, and Nizar Habash.
          <source>CAMeL Tools: An Open Source Python Toolkit for Arabic Natural Language Processing. page 11</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Edward</given-names>
            <surname>Loper</surname>
          </string-name>
          and
          <string-name>
            <given-names>Steven</given-names>
            <surname>Bird</surname>
          </string-name>
          .
          <article-title>NLTK: The Natural Language Toolkit</article-title>
          . arXiv:cs/0205028, May
          <year>2002</year>
          . arXiv: cs/0205028.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Colin</given-names>
            <surname>Martindale</surname>
          </string-name>
          . Romantic Progression: The Psychology of Literary History, Hemisphere, Washington, DC,
          <year>1975</year>
          . Google Scholar,
          <year>1975</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Colin</given-names>
            <surname>Martindale</surname>
          </string-name>
          .
          <article-title>The clockwork muse: The predictability of artistic change. The clockwork muse: The predictability of artistic change</article-title>
          .
          <source>Basic Books</source>
          , New York, NY, US,
          <year>1990</year>
          . Pages: xiv,
          <fpage>411</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Cer</surname>
          </string-name>
          , Yinfei Yang,
          <string-name>
            <surname>Sheng-yi Kong</surname>
            , Nan Hua, Nicole Limtiaco, Rhomni St John, Noah Constant, Mario Guajardo-Cespedes,
            <given-names>Steve</given-names>
          </string-name>
          <string-name>
            <surname>Yuan</surname>
            , and
            <given-names>Chris</given-names>
          </string-name>
          <string-name>
            <surname>Tar</surname>
          </string-name>
          .
          <article-title>Universal sentence encoder</article-title>
          .
          <source>arXiv preprint arXiv:1803.11175</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>James</surname>
            <given-names>W.</given-names>
          </string-name>
          <string-name>
            <surname>Pennebaker</surname>
          </string-name>
          , Ryan L. Boyd, Kayla
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>and Kate</given-names>
          </string-name>
          <string-name>
            <surname>Blackburn</surname>
          </string-name>
          .
          <article-title>The development and psychometric properties of LIWC2015</article-title>
          .
          <source>Technical report</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Jacob</surname>
            <given-names>Devlin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <article-title>Bert: Pretraining of deep bidirectional transformers for language understanding</article-title>
          .
          <source>arXiv preprint arXiv:1810.04805</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>