<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Detecting Fake News in Tweets from Text and Propagation Graph: IRISA's Participation to the FakeNews Task at MediaEval 2020</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vincent Claveau</string-name>
          <email>vincent.claveau@irisa.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CNRS, IRISA</institution>
          ,
          <addr-line>Univ. Rennes</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>14</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>This paper presents the participation of IRISA to the task of fake news detection from tweets, relying either on the text or on propagation information. For the text based detection, variants of BERTbased classification are proposed. In order to improve this standard approach, we investigate the interest of augmenting the dataset by creating tweets with fine-tuned generative models. For the graph based detection, we have proposed models characterizing the propagation of the news or the users' reputation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION AND RELATED WORK</title>
      <p>
        This paper describes the systems that we developed for the
textbased and structure-based MediaEval 2020 Fake News detection
challenge. These two subtasks and the datasets are detailed in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
and [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        Text classification is a common NLP task [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Although simple
machine learning approaches have shown promising results for
fake news detection [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], the recent transformer-based architectures,
such as BeRT [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], have set new standards. Several large pre-trained
transformer models are now available; they are known to yield
stateof-the-art results on many NLP tasks including text classification
[16, inter alia]. We rely on one of these pre-trained models to build
our systems. In order to improve this standard approach, we have
investigated the interest of augmenting the dataset artificially by
generating tweets with fine-tuned generative models (one for each
class). These approaches and results are detailed in Sec. 2.
      </p>
      <p>
        Similarly, classification of data represented as a graph, and in
particular node classification, is not new but the recent trend is
to use deep learning [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Yet, for the specific domain of fake news
detection, other approaches are possible. In particular, it has been
shown that the fake news are propagated diferently (and faster)
than legit news [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The use of node reputation and link-based
analysis, as it is done in the detection of spam web pages from the
Web graph (such as TrustRank [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], an adaptation of PageRank [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ])
is another inspiration for our approaches. Our two approaches are
further detailed in Sec. 3.
2.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>TEXT-BASED APPROACHES</title>
    </sec>
    <sec id="sec-3">
      <title>Pre-processing</title>
      <p>
        From the tweets still online1, the text is extracted and pre-processed
as follows. Emojis are transformed into texts [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. URLs are changed
1At retrieval time, respectively 227, 128 and 80 tweets were no longer available for the
class ’non’, ’5G’, ’other’ in the dev set.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Generating artificial examples</title>
      <p>
        For this task we wanted to investigate the use of generative models
in order to artificially augment and balance the datasets. Indeed,
the performance of neural language models based on transformers
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] makes this task realistic. To do so, we use GPT2 (Generative
Pre-Trained Transformers), a model built from stacked transformers
(precisely, decoders) trained on a large corpus by auto-regression
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Three GPT2 models – one for each class – are fine-tuned
(from the 355M-parameter pre-trained model) with the tweets of
the dev set. The amount of tweets available is very small; we stopped
the iterations when perplexity reached 0.5. The way this stopping
criterion impacts the results would need further investigations,
which were not possible due to the limited time of the challenge.
For the generation, we randomly picked up tweets and kept the two
ifrst words to serve as bootstrap. The temperature, which controls
the creativity of the model, was set at 0.7. Here again, we had no
time to investigate the impact of this parameter. Approximately
20,000 tweets were generated for each class. Here are some tweets
generated for the class ’5G conspiracy’:
Crude and unproductive! Turn off the 5G in your area and see
if that helps. Covid19 is not funny. I hope that the Wuhan
government puts an end to this immediately.
"Immigrants are the cause of 5G towers, they’re the cause
of the coronavirus outbreak, they’re the covid-19 victims,
the 5G towers are the weapon which will eradicate the world
population, 5G lays the microchips for the virus, i read
somewhere that the 5G was debuting prior to the introduction of
the COVID-19 virus to negate some of the hype around COVID-19
2.3
      </p>
    </sec>
    <sec id="sec-5">
      <title>Classification models</title>
      <p>
        Our 4 classification variants are based on the RoBerta-large model
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. It was preferred over other transformer-based representations
because its tokenizer is expected to be more suited for the tweet
writing specifics. We have tested models with diferent classification
layers (SVM, logistic regression), with or without fine tuning, and
with or without artificial examples. Finally, the submitted runs are
the following ones:
model 1: tweet embedding from the Roberta model (not fine-tuned),
and SVM (RGB kernel);
model 2: Roberta model with a linear classification layer, fine-tuned
on the task (3 epochs);
model 3: same as model 2, with artificially generated examples (3
epochs);
model 4: same as model 3 (4 epochs).
2.4
      </p>
    </sec>
    <sec id="sec-6">
      <title>Results of text-based detection</title>
      <p>The results of our models are given in Tab. 1. When available, in
addition to the oficial score on the test set, we provide Matthews
correlation coeficient (MCC), micro-F1 (accuracy) and macro-F1 on
the dev data (80% for training, 20% for validation). Note that due to
the cost of the artificial example generation and the small amount
of data, the GPT2 models are fine-tuned on all the available dev
data; we do not have reliable results for models 3 and 4 (generated
tweets added to the training set can be very similar to those in the
validation set).</p>
      <p>From the results, we see that fine-tuning the representation
(model 2 vs. model 1) is beneficial. Unfortunately, the artificially
generated tweets (model 3 and 4) do not yield the expected
improvement. From the confusion matrices, one can see that the class
’other conspiracy’ has the poorest results, with tweets being equally
labeled as ’5G’, ’non’ or ’other’.
3</p>
    </sec>
    <sec id="sec-7">
      <title>GRAPH-BASED APPROACHES</title>
      <p>For the second sub-task, we have proposed two models, based on
two diferent sets of features. They are described in the following
subsections, as well as the machine learning algorithms adopted
and their results.
3.1</p>
    </sec>
    <sec id="sec-8">
      <title>Modeling the user’s reputation</title>
      <p>
        This set of features aims at taking into account if one of the users
posting or propagating the news has already be seen. Each user is
indeed associated with a score for each possible label, computed
from the numbers of training samples of each class it was associated
with. We also take into account the scores of the neighbors of this
user, their own neighbors, and so on... In practice, this is
implemented with the PageRank algorithm [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] on the undirected graph
with a dumping factor set to 0.8 (optimized by cross-validation).
Finally, each sample ends up with one value for each class; these
three scores are the features used by the classifier.
      </p>
      <p>
        Several learning algorithms have been tested (logistic regression,
random forests, SVM; as implemented in scikit learn [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]). The
optimal settings for their hyper-parameters are grid-searched using 20%
of the dev set as validation set. The weight of each sample is adapted
      </p>
      <sec id="sec-8-1">
        <title>V. Claveau</title>
        <p>according to the inverse of its class proportion (’balanced’ strategy).
With their optimal settings, the diferent learning algorithms finally
show little diferences. For this set of features, the submitted run
was produced with a random forest (1,000 trees with a maximal
depth set to 5, Out-of-Bag weights used in the prediction).
3.2</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>Modeling the propagation</title>
      <p>This set of features is built by considering how the tweet is
propagated (without considering the users’ reputations). These features
can be used even if every involved user has never been seen
before and is not connected any known user. The features include
(with n0 the first user tweeting the piece of news): number of nodes
in the propagation graph; total number of friends and followers
(for all nodes implied), as well as the median, 25% percentile, 75%
percentile of followers; number of followers and friends of n0;
diference between the number of followers and friends of n0; maximal,
minimal, average, median, 25% percentile, 75% percentile of retweet
time; times to reach at least 100, 1,000, 10,000 followers and so
on up to 200,000 followers. With this set of features, a SVM has
been used with the following parameters: standardized features
(removed mean and scaled to unit variance), RBF kernel, C=0.9,
gamma automatically set with the ’scale’ heuristics.
3.3</p>
    </sec>
    <sec id="sec-10">
      <title>Results of graph-based detection</title>
      <p>The results of the systems are given in Tab. 1. The cross-validation
and oficial results are consistent; they both show the advantage of
the reputation-based approach, especially when considering
microF1. The diference between cross-validation and oficial test score
may be explained by a lower amount of already seen nodes in the
test set, compared to what was generated by cross-validation. A
system exploiting all the proposed features (propagation +
reputation) was also tested but obtained no statistical diference with the
reputation only features.</p>
      <p>For both models, the ’other conspiracy’ class is again the most
error-prone (proportionally), with an equal amount of the its tweets
being classified in the three classes. Overall, for both feature sets,
many errors are caused by confusion between the 5G and non 5G
conspiracy tweets.
4</p>
    </sec>
    <sec id="sec-11">
      <title>CONCLUSION AND FUTURE WORK</title>
      <p>For the detection of fake news based on the text, we have adopted
a state-of-the-art approach based on RoBerta. The scores obtained
show that there exists a large margin for progress, especially when
dealing with close classes (5G vs. other conspiracies). The idea
of incorporating artificially generated examples did not result in
better performance and still needs some work. First, we may find
better ways to set the training and generation hyper-parameters.
Secondly, we plan to investigate the use of generative model to
expand the sample at inference time.</p>
      <p>
        For the detection based on the structure, we have shown that
simple approaches like reputation already ofered promising results,
even on small datasets with many unseen-before nodes. In addition
to this type of approach, we want to explore more recent node
representation techniques that make it possible to use deep learning,
such as node2vec [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] or subsequent variants.
      </p>
      <sec id="sec-11-1">
        <title>FakeNews: Corona virus and 5G conspiracy</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Sergey</given-names>
            <surname>Brin</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lawrence</given-names>
            <surname>Page</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>The Anatomy of a LargeScale Hypertextual Web Search Engine</article-title>
          .
          <source>In Proceedings of the Seventh International Conference on World Wide Web</source>
          <volume>7</volume>
          (
          <issue>WWW7</issue>
          ). Elsevier Science Publishers B. V.,
          <string-name>
            <surname>Brisbane</surname>
          </string-name>
          , Australia,
          <fpage>107</fpage>
          -
          <lpage>117</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          .
          <source>In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers).
          <source>Association for Computational Linguistics</source>
          , Minneapolis, Minnesota,
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          . https://doi.org/10.18653/v1/
          <fpage>N19</fpage>
          -1423
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Aditya</given-names>
            <surname>Grover</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jure</given-names>
            <surname>Leskovec</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Node2vec: Scalable Feature Learning for Networks</article-title>
          .
          <source>In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16)</source>
          .
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <fpage>855</fpage>
          -
          <lpage>864</lpage>
          . https://doi.org/10.1145/2939672.2939754
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Gyngyi</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Garcia-Molina</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Link spam alliances</article-title>
          .
          <source>In Proceedings of the 31st international conference on Very large data bases, VLDB. Trondheim, Norway</source>
          ,
          <fpage>517</fpage>
          -
          <lpage>528</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>William</surname>
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Hamilton</surname>
            , Rex Ying, and
            <given-names>Jure</given-names>
          </string-name>
          <string-name>
            <surname>Leskovec</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Representation Learning on Graphs: Methods and Applications</article-title>
          .
          <source>IEEE Computer Society Technical Committee on Data Engineering</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Kamran</given-names>
            <surname>Kowsari</surname>
          </string-name>
          , Kiana Jafari Meimandi, Mojtaba Heidarysafa, Sanjana Mendu, Laura Barnes, and
          <string-name>
            <given-names>Donald Brown. 2019. Text</given-names>
            <surname>Classification Algorithms: A Survey. Information</surname>
          </string-name>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Yinhan</given-names>
            <surname>Liu</surname>
          </string-name>
          , Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen,
          <string-name>
            <surname>Omer Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Mike</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Luke</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Veselin</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>RoBERTa: A Robustly Optimized BERT Pretraining Approach</article-title>
          . (
          <year>2019</year>
          ).
          <article-title>arXiv:cs</article-title>
          .CL/
          <year>1907</year>
          .11692
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Cédric</given-names>
            <surname>Maigrot</surname>
          </string-name>
          , Vincent Claveau, Ewa Kijak, and
          <string-name>
            <given-names>Ronan</given-names>
            <surname>Sicre</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>MediaEval 2016: A multimodal system for the Verifying Multimedia Use task</article-title>
          .
          <source>In MediaEval</source>
          <year>2016</year>
          :
          <article-title>”Verfiying Multimedia Use” task</article-title>
          . Hilversum, Netherlands. https://doi.org/10.1145/1235
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Duchesnay</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Scikit-learn: Machine Learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <year>2011</year>
          ),
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Konstantin</surname>
            <given-names>Pogorelov</given-names>
          </string-name>
          , Daniel Thilo Schroeder, Luk Burchard, Johannes Moe, Stefan Brenner, Petra Filkukova, and
          <string-name>
            <given-names>Johannes</given-names>
            <surname>Langguth</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>FakeNews: Corona Virus and 5G Conspiracy Task at MediaEval 2020</article-title>
          . In MediaEval 2020 Workshop.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Alec</surname>
            <given-names>Radford</given-names>
          </string-name>
          , Jef Wu, Rewon Child, David Luan,
          <string-name>
            <given-names>Dario</given-names>
            <surname>Amodei</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Ilya</given-names>
            <surname>Sutskever</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Language Models are Unsupervised Multitask Learners</article-title>
          .
          <source>OpenAI Blog</source>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Thilo</surname>
          </string-name>
          <string-name>
            <surname>Schroeder</surname>
          </string-name>
          , Konstantin Pogorelov, and
          <string-name>
            <given-names>Johannes</given-names>
            <surname>Langguth</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>FACT: a Framework for Analysis and Capture of Twitter Graphs</article-title>
          .
          <source>In 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)</source>
          . IEEE,
          <fpage>134</fpage>
          -
          <lpage>141</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Kevin</given-names>
            <surname>Wurster Taehoon Kim</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Emoji Python library</article-title>
          . (
          <year>2020</year>
          ). https://pypi.org/project/emoji/
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Ashish</surname>
            <given-names>Vaswani</given-names>
          </string-name>
          , Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,
          <article-title>Ł ukasz Kaiser, and</article-title>
          <string-name>
            <given-names>Illia</given-names>
            <surname>Polosukhin</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Attention is All you Need</article-title>
          .
          <source>In Advances in Neural Information Processing Systems</source>
          30, I. Guyon,
          <string-name>
            <given-names>U. V.</given-names>
            <surname>Luxburg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wallach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fergus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vishwanathan</surname>
          </string-name>
          , and R. Garnett (Eds.). Curran Associates, Inc.,
          <fpage>5998</fpage>
          -
          <lpage>6008</lpage>
          . http://papers.nips.cc/paper/7181-attention
          <article-title>-is-all-you-need</article-title>
          .pdf
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Soroush</surname>
            <given-names>Vosoughi</given-names>
          </string-name>
          , Deb Roy, and
          <string-name>
            <given-names>Sinan</given-names>
            <surname>Aral</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>The spread of true and false news online</article-title>
          .
          <source>Science</source>
          <volume>359</volume>
          ,
          <issue>6380</issue>
          (
          <year>2018</year>
          ),
          <fpage>1146</fpage>
          -
          <lpage>1151</lpage>
          . https://doi.org/10.1126/science.aap9559 arXiv:https://science.sciencemag.org/content/359/6380/1146.full.pdf
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Alex</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Amanpreet</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <surname>Julian Michael</surname>
          </string-name>
          , Felix Hill,
          <string-name>
            <given-names>Omer Levy</given-names>
            , and
            <surname>Samuel</surname>
          </string-name>
          <string-name>
            <given-names>R.</given-names>
            <surname>Bowman</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding</article-title>
          .
          <source>In 7th International Conference on Learning Representations, ICLR</source>
          <year>2019</year>
          ,
          <article-title>New Orleans</article-title>
          , LA, USA, May 6-
          <issue>9</issue>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>