<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Workshop - April</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>ClaimHunter: An unattended tool for automated claim detection on Twitter</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Javier Beltrán</string-name>
          <email>javier.beltran@newtral.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rubén Míguez</string-name>
          <email>ruben.miguez@newtral.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Irene Larraz</string-name>
          <email>irene.larraz@newtral.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fact-checking</institution>
          ,
          <addr-line>Newtral, Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>R&amp;D Department</institution>
          ,
          <addr-line>Newtral, Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>14</volume>
      <issue>2021</issue>
      <abstract>
        <p>As political campaigns have moved from traditional media to social networks, fact-checkers must also adapt how they are working. The explosion of information (and disinformation) on social networks makes impossible to manually fact-check each piece of data. With this reality in mind, Newtral, a fact-checking organization, has developed its own automated monitor tool for Twitter: ClaimHunter. Recently, deep learning approaches have obtained very high performance across many different NLP tasks. Automated claim detection is not an exception. These models are showing promising results on fact-checking scenarios without task-specific feature engineering. Based on the BERT architecture, ClaimHunter AI models shown a 80% F1 score tested on real-life scenarios with expert fact-checkers. Through a simple UI interface deployed on Slack, ClaimHunter notifies journalists and gets feedback from their day-to-day work to improve the final performance of the algorithm. Launched 6 months ago, ClaimHunter has processed more than 130.000 tweets expanding Newtral's operation beyond national politicians to the regional and local level. The number of reviewed claims per day have increased by a multiplicative factor of 10 since the adoption of the ClaimHunter tool by Newtral's fact-checking team. This paper focuses on explaining the challenges of building such a system inside of a fact-checking organization: data labelling and fact-checker alignment, system architecture, testing and deployment of underlying models, continuous feedback and model refinement.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CCS CONCEPTS
• Computing methodologies → Artificial intelligence; Natural
language processing; Information extraction;
• Human centered computing →
evaluation methodologies; Field studies;
• Information systems → Information retrieval; Evaluation of
retrieval results; Presentation of retrieval results;
HCI; HCI design and
1</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <p>In the journalism domain we define fact-checking as the task of
assessing if a claim made by a public figure is true or not. This is
a complex activity normally performed by trained professionals
(fact-checkers) that must evaluate known facts and data published
by official institutions to reach a final verdict.</p>
      <p>
        The fact-checking process involves four main steps: 1)
monitoring of relevant sources; 2) spotting facts; 3) data
verification; and 4) publication [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This paper focuses on the first
two steps.
      </p>
      <p>
        Monitoring and spotting claims is a time-consuming task
which inevitably must be automated. Despite the importance of
this component for fact-checking, automated claim detection
algorithms are still in an early stage [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However, the
increased demand for fact-checking and the recent advances of
deep learning techniques for NLP has stimulated a rapid progress
in developing tools and systems to automate parts of this task
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ][
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        To automatically decide which information is fact-checked, a
definition of a check-worthy claim must be agreed. However, the
concept of check-worthiness lacks of an agreed definition on
scientific literature resulting in inconsistent and unreliable
datasets [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Frequently researchers build training datasets based
only on claims published by fact-checkers on their web sites. This
conceptualization is flawed by design. Per each published content,
fact-checkers have to spot and review dozens of potentially
checkworthy claims. Only by having access to this internal work large
non biased datasets can be built. Besides, to precisely identify a
check-worthy statement is not an easy task. Expert knowledge is
needed. Check-worthy claims normally are: objective, do not
contain information that is common knowledge, establish some
sort of comparison and are verifiable with data. As the political
discourse moved to Twitter, tweets have become the object of
study for fact-checkers across the world. ClaimHunter is our
solution to automate the detection of relevant tweets. In this work,
we define a tweet as check-worthy if there is at least one
checkworthy claim on it.
      </p>
      <p>The structure of this paper is as follows: We begin (in Section
2) by reviewing the most relevant initiatives in the automated
claim detection field. We then explore (in Section 3) how
ClaimHunter was built, including the training dataset, the system
architecture and an overall description of its UI. Further (in
Section 4) we describe our learning model, evaluate it with
performance metrics and explore its evolution through time.
Finally (in Section 5) we draw our main conclusions after testing
the system for 6 months and outline some research lines for the
near future.
2</p>
    </sec>
    <sec id="sec-3">
      <title>Related work</title>
      <p>
        One of the first and most well-known approaches to claim
detection is ClaimBuster [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. It is built on a large annotated
dataset of factual sentences taken from American presidential
debates. It uses a machine learning algorithm based on SVM
combining TF-IDF, POS tagging and NER features. Another
known approach to claim detection is ClaimRank [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] supporting
both English and Arabic. It is built on a dataset of factual
sentences published by 9 different fact-checking organizations
and applies a multi-task learning setup. Main novelty on this
paper is the inclusion of a variety of contextual and
discoursebased features. ClaimRank can mimic the claim selection
strategies of each of them or the union of them all.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] authors collaborated with FullFact, an independent
factchecking organization, to create a classifier based on universal
sentence representations. The system was tested with real
factcheckers through a live feed of transcripts from TV called “Live”.
Squash [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], system developed by Duke Reporters’ Labrom, is
another proposal to live fact-checking where ClaimBuster
algorithms are combined with ElasticSearch to promote real-time
search of previously verified claims.
      </p>
      <p>
        From a brief review of system architectures for automated
fact-checking in the scientific literature we observe that a
combination of deep neural networks (DNNs), non-DNNs and
heuristic approaches are commonly employed. The Fact
Extraction and VERification (FEVER) dataset [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] enables the
development of data-driven neural approaches to the automatic
fact checking task. Additionally, the FEVER Shared Task [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
introduced a benchmark, the first of this kind, to evaluate both
evidence retrieval and claim verification tasks. The CheckThat!
Lab at the Conference and Labs of the Evaluation Forum (CLEF),
different research groups compete to create claim verification
models. The workshop proposes four complementary tasks,
offered in English, Arabic and Spanish. One of those tasks focuses
on identifying which tweets in a Twitter stream are worth
factchecking [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Latest studies on this topic explore the fusion of
syntactic features and BERT embeddings, to classify
checkworthiness of tweets with promising results [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>In ClaimHunter, we have fine-tuned a BERT model for the
check-worthiness task using a large dataset (+30K tweets)
annotated by expert fact-checkers. As far as we know this is the
first work where such a model has been developed by
factcheckers and tested on real-life scenarios for a 6 month-period.
Next sections describe how this system works.
3</p>
    </sec>
    <sec id="sec-4">
      <title>System architecture</title>
      <p>ClaimHunter is a monitoring tool for Twitter which accelerates
the traditional fact-checking process by automatically detecting
check-worthy content. The behavior of ClaimHunter consists of
the following steps, also summarized in Figure 1:
1.
2.
3.
4.
5.
6.
7.</p>
      <p>Expert fact-checkers establish the Twitter accounts to
monitor based on their public relevance.</p>
      <p>ClaimHunter retrieves tweets from selected accounts in real
time via the Twitter API.</p>
      <p>ClaimHunter’s detector classifies tweets as positive if they
contain a check-worthy claim or negative otherwise.</p>
      <p>Positive tweets are sent to Slack. A small fraction of negative
tweets, chosen randomly, is also sent (see Section 3.1).
Fact-checkers review the tweets classified as check-worthy
and confirm or reject the predicted labels.</p>
      <p>Both predicted label and manual feedback from fact-checkers
are stored in the database.</p>
      <p>New labeled samples from step 6 are used to retrain the
model.
ClaimHunter follows a supervised learning approach. Supervised
machine learning requires annotated datasets for the objective task
but human labeling is expensive and slow. Journalism expertise is
needed to create a good quality dataset. Besides, claim detection is
a highly unbalanced classification problem where not
checkworthy claims (negative class) are by far more common than
check-worthy ones (positive class). Our empirical estimate shows
that only 10-15% of our Twitter feed were check-worthy tweets.
Our initial model was built on a dataset of 5.000 tweets manually
annotated by 3 fact-checkers independently. Only tweets
considered check-worthy by at least 2 fact-checkers were labeled
as check-worthy. We followed an iterative approach launching
new model releases when new data was available through the
feedback loop integrated in the system. After 6 months of
ClaimHunter running at Newtral, our dataset has increased from
5K to more than 30K tweets.</p>
      <p>Beside positive tweets, ClaimHunter sends to Slack a small
random fraction of the negative predictions that is also reviewed
by fact-checkers. This prevents from biasing the dataset by adding
only tweets originally predicted as positive. Only a fraction is sent
because negative tweets are the majority class, so we are
progressively undersampling it by making the check-worthy
tweets more representative for training. The negative fraction is a
predefined parameter set to 30%. Fact-checkers' selection of
which claim to review and which to ignore can be biased to their
interest, expertise, workload and other contextual factors. To
minimize this issue, journalists were asked to label claims based
on their factuality and not their journalistic relevance.
•
•
3.2</p>
    </sec>
    <sec id="sec-5">
      <title>Automated ClaimDetection in the newsroom</title>
      <p>Developed as a Slack app, ClaimHunter monitors Twitter
accounts and sends alerts to a private channel (#tweets by default)
when a new tweet is classified as check-worthy. Newtral’s
factcheckers review its content and give feedback on its
checkworthiness as part of their daily workflow. The UI offers three
options:
•</p>
      <p>Rejected: Prediction was wrong. The selected tweet
does not contain any claim.</p>
      <p>Reviewed: Prediction was right. The selected tweet
contains a claim but there is no interest in publishing a
fact-check on it.</p>
      <p>Selected: Prediction was right. The selected tweet
contains a claim and the fact-checker proposes it to
publish a fact-check.</p>
      <p>The category “Reviewed” allows to include a bigger number
of check-worthy claims to the training dataset, independently of
their potential for publication.</p>
      <p>When a fact-checker checks one tweet as “Selected”, a copy of
the tweet is automatically sent to a new channel (#fact-check by
default). The head of the fact-checking unit reviews content in this
channel and make a final decision on which content is promoted
to a fact-check story. Rejected and reviewed tweets are removed
from the Slack feed.
4</p>
    </sec>
    <sec id="sec-6">
      <title>Claim Detection experiments</title>
      <p>
        ClaimHunter tackles the claim detection problem as a binary
classification problem with a high unbalance on the positive class.
However, this issue can be attenuated via the feedback mechanism
described above. We follow standard experimentation practices,
splitting the dataset into training (80%) validation (10%) and test
set (10%). The test set is used for evaluation only, being our main
evaluation metric the F1 score of the positive class, which
corresponds to the harmonic mean between Precision and Recall.
We propose a transfer learning approach to build our claim
detection classifier. We leveraged our dataset to progressively
fine-tune a pre-trained XLM-RoBERTa1 [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] model for the claim
identification task. This model is based on BERT and hence we
follow the recommended practices for fine-tuning a BERT
architecture for text classification [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. The following
hyperparameters were adjusted on the validation dataset: epochs =
2, batch_size = 32 and learning_rate = 2e-5. Adam with weight
decay was used as optimizer.
      </p>
      <p>Our final dataset contains 31.883 tweets by Spanish
representatives and political parties, 32% of which are labeled
positive. While most tweets are written in Spanish, there is a small
fraction in Catalan, Galician and Basque, due to the co-existence
of several co-official languages in Spain. We are not filtering
these out because: 1) XLM-RoBERTa is a multilanguage model
and 2) they are valuable for our use case.</p>
      <p>To better assess the quality of our model, we compare it with two
different baselines:
• LR-NNLM: A logistic regression model which uses
NNLM sentence embeddings2 as a feature extraction
strategy [17].
• SVM-TFIDF: A SVM with linear kernel where the
features are a bag-of-words model limited to the 20.000
most common unigrams and the feature values are
scaled with the TF-IDF transformation.</p>
      <p>Hyperparameters such as the vocabulary size and the degree of
regularization were adjusted on the validation set and the best
model was selected during training, so they can be compared
fairly. Table 1 summarizes the main results achieved during our
test, comparing our final candidate model with the two proposed
baselines.
1 We have used the model xlm-roberta-base available on the Huggingface repository
through the library transformers==4.2.2.
2 These embeddings are available at Tensorflow Hub
https://tfhub.dev/google/nnlmes-dim128/2</p>
      <sec id="sec-6-1">
        <title>Model</title>
      </sec>
      <sec id="sec-6-2">
        <title>Precision</title>
      </sec>
      <sec id="sec-6-3">
        <title>Recall F1</title>
      </sec>
      <sec id="sec-6-4">
        <title>LR-NNLM</title>
        <p>66.95%
53.45%
test set, we have been monitoring the precision achieved by the
classifier through time over the period we have being using
ClaimHunter at Newtral. Figure 3 shows the evolution of the
precision metric over the last 20 weeks of 2020.
similarity that Catalan and Galician maintain with Spanish makes
that the</p>
        <p>model provides particularly good results for these
languages without annotated datasets that could be harder to
obtain in these lower resource languages.</p>
        <p>This is a desirable
outcome for use cases like ours. In Spain, the political debate
happens in several languages and newsrooms need fact-checking
tools capable of working on several languages at a time.
Quantitative evaluation is needed to validate our expectations
regarding the multilingual capabilities of our developed model.
5</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Future directions</title>
      <p>Our iterative approach to model development has shown positive
results over time in real-life scenarios. ClaimHunter was designed
as an internal tool for Newtral but, in February 2021, we have
opened it to other fact-checking agencies. Fact-checkers from
Chile, Mexico, Ecuador and Colombia are currently testing
ClaimHunter in different political contexts. Our goal is to check
whether the model generalizes properly and satisfies the mixed
criteria of different agencies in different countries. Besides, we
are building a more rigorous benchmark to test its multilanguage
capabilities on Spanish co-official languages. In the future we also
plan to expand ClaimHunter capabilities to other social networks
as Facebook and Instagram.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Mevan</given-names>
            <surname>Babakar</surname>
          </string-name>
          and
          <string-name>
            <given-names>Will</given-names>
            <surname>Moy</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>The State of Automated Factchecking</article-title>
          .
          <source>Technical report. Retrieved</source>
          from https://fullfact.org/blog/2016/aug/automatedfactchecking/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Naeemul</given-names>
            <surname>Hassan</surname>
          </string-name>
          , Bill Adair, James Hamilton,
          <string-name>
            <given-names>Chengkai</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>Mark Tremayne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Jun</given-names>
            <surname>Yang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Cong</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>The Quest to Automate Fact-Checking</article-title>
          .
          <source>In Proceedings of the 2015 Computation + Journalism Symposium.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Lucas</given-names>
            <surname>Graves</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Understanding the Promise and Limits of Automated FactChecking</article-title>
          .
          <source>Technical report</source>
          , Reuters Institute, University of Oxford. Retrieved from https://reutersinstitute.politics.ox.ac.uk/our-research/understandingpromise-and
          <article-title>-limits-automated-fact-checking</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Israa</given-names>
            <surname>Jaradat</surname>
          </string-name>
          , Pepa Gencheva, Alberto Barrón-Cedeño,
          <string-name>
            <given-names>Lluís</given-names>
            <surname>Màrquez</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Preslav</given-names>
            <surname>Nakov</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>ClaimRank: Detecting CheckWorthy Claims in Arabic and English</article-title>
          .
          <source>In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Neema</given-names>
            <surname>Kotonya</surname>
          </string-name>
          and
          <string-name>
            <given-names>Francesca</given-names>
            <surname>Toni</surname>
          </string-name>
          .
          <year>2020</year>
          . Explainable Automated FactChecking: A Survey. arXiv preprint arXiv:
          <year>2011</year>
          .03870.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Liesbeth</given-names>
            <surname>Allein</surname>
          </string-name>
          and
          <string-name>
            <surname>Marie-Francine</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Checkworthiness in Automatic Claim Detection Models: Definitions and Analysis of Datasets</article-title>
          . In Multidisciplinary International Symposium on Disinformation in Open Online Media. Springer, Cham,
          <fpage>1</fpage>
          -
          <lpage>17</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Naeemul</given-names>
            <surname>Hassan</surname>
          </string-name>
          , Gensheng Zhang, Fatma Arslan, Josue Caraballo, Damian Jimenez, Siddhant Gawsane, Shohedul Hasan, Minumol Joseph, Aaditya Kulkarni, Anil Kumar Nayak, et al.
          <year>2017</year>
          .
          <article-title>ClaimBuster: the first-ever end-toend fact-checking system</article-title>
          .
          <source>In Proceedings of the VLDB Endowment</source>
          <volume>10</volume>
          ,
          <issue>12</issue>
          (
          <year>2017</year>
          ),
          <fpage>1945</fpage>
          -
          <lpage>1948</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Lev</given-names>
            <surname>Konstantinovskiy</surname>
          </string-name>
          , Oliver Price, Mevan Babakar and
          <string-name>
            <given-names>Arkaitz</given-names>
            <surname>Zubiaga</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Towards automated factchecking: Developing an annotation schema and benchmark for consistent automated claim detection</article-title>
          . arXiv preprint arXiv:
          <year>1809</year>
          .08193.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Bill</given-names>
            <surname>Adair</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Squash report card: Improvements during State of the Union … and how humans will make our AI smarter for consistent automated claim detection</article-title>
          .
          <source>Retrieved February 9</source>
          ,
          <year>2021</year>
          from https://reporterslab.org
          <article-title>/squashreport-card-improvements-during-state-of-the-union-and-how-humans-willmake-our-ai-smarter/.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>James</surname>
            <given-names>Thorne</given-names>
          </string-name>
          , Andreas Vlachos, Christos Christodoulopoulos, and
          <string-name>
            <given-names>Arpit</given-names>
            <surname>Mittal</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>FEVER: a large-scale dataset for fact extraction and verification</article-title>
          .
          <source>In NAACL-HLT.</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>James</surname>
            <given-names>Thorne</given-names>
          </string-name>
          , Andreas Vlachos, Christos Christodoulopoulos, and
          <string-name>
            <given-names>Arpit</given-names>
            <surname>Mittal</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>The fact extraction and verification (fever) shared task</article-title>
          .
          <source>arXiv preprint arXiv:1811</source>
          .10971.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Alberto</given-names>
            <surname>Barrón-Cedeño</surname>
          </string-name>
          , Tamer Elsayed, Preslav Nakov, Giovanni Da San Martino, Maram Hasanain, Reem Suwaileh and
          <string-name>
            <given-names>Fatima</given-names>
            <surname>Haouaril</surname>
          </string-name>
          .
          <year>2018</year>
          . CheckThat! at CLEF 2020:
          <article-title>Enabling the Automatic Identification and Verification of Claims in Social Media</article-title>
          .
          <source>ECIR 2020. Lecture Notes in Computer Science</source>
          , vol
          <volume>12036</volume>
          . Springer, Cham. https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -45442- 5_
          <fpage>65</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Gullal</surname>
            <given-names>Cheema</given-names>
          </string-name>
          , Sherzod Hakimov and
          <string-name>
            <given-names>Ralph</given-names>
            <surname>Ewerth</surname>
          </string-name>
          .
          <year>2020</year>
          . Check_square at CheckThat! 2020:
          <article-title>Claim Detection in Social Media via Fusion of Transformer and Syntactic Features</article-title>
          . arXiv preprint arXiv:
          <year>2007</year>
          .10534
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Naman</surname>
            <given-names>Goyal</given-names>
          </string-name>
          , Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer,
          <string-name>
            <given-names>Veselin</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Unsupervised Cross-lingual Representation Learning at Scale</article-title>
          .
          <article-title>Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</article-title>
          . DOI: https://doi.org/10.18653/v1/
          <year>2020</year>
          .acl-main.
          <fpage>747</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Jacob</surname>
            <given-names>Devlin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          .
          <article-title>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)</article-title>
          . DOI: https://doi.org/10.18653/v1/
          <fpage>N19</fpage>
          -1423
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Yoshua</surname>
            <given-names>Bengio</given-names>
          </string-name>
          , Réjean Ducharme, Pascal Vincent,
          <string-name>
            <given-names>Christian</given-names>
            <surname>Jauvin</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>A Neural Probabilistic Language Model</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>3</volume>
          ,
          <fpage>1137</fpage>
          -
          <lpage>1155</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>