<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Conference and Labs of the Evaluation Forum, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Accenture at CheckThat! 2023: Impacts of Back-translation on Subjectivity Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sieu Tran</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paul Rodrigues</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benjamin Strauss</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Evan M. Williams</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Accenture</institution>
          ,
          <addr-line>1201 New York Ave NW, Washington, DC 20005</addr-line>
          ,
          <country country="US">United States</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Carnegie Mellon University</institution>
          ,
          <addr-line>5000 Forbes Avenue, Pittsburgh, PA 15213</addr-line>
          ,
          <country country="US">United States</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>1</volume>
      <fpage>8</fpage>
      <lpage>21</lpage>
      <abstract>
        <p>This paper discusses the CLEF CheckThat! Lab Task 2 on Subjectivity in News Articles, and our approach on using back-translation to augment the minority classes in Arabic, English, Turkish, German, Italian, and Dutch to distinguish subjective and objective statements. While we find that back-translation works well for other tasks in the fact-checking pipeline, we find that it does not work as well for subjectivity detection. This paper begins to examine several reasons why back-translation as an NLP data augmentation strategy could inhibit subjectivity detection.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;subjectivity detection</kwd>
        <kwd>opinion detection</kwd>
        <kwd>news analysis</kwd>
        <kwd>data-driven journalism</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>children, including one entitled What Is White Privilege?" is labeled as ’Objective’, rather than
’Subjective’. As the sentence contains specific, falsifiable claims, this seems to be a reasonable
labeling. However, the characterization of the books as tools of ‘Leftist indoctrination’, is clearly
a subjective editorialization on the part of the author. This highlights the inherent ambiguity
present in the task and underscores a core challenge that the annotators, and the models both
face in learning a clear decision boundary.</p>
      <p>
        In this work, we describe the back-translation augmentation strategies and models employed
by Team Accenture’s submissions to Task 2. Team Accenture’s back-translation and transformer
approach yielded the 3rd highest submissions in Arabic, 4th in Turkish, 5th in Dutch, and 8th
in German and English. While back-translation has been shown to be an efective means of
NLP data augmentation to improve checkworthiness identification [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], we speculate that the
approach may reduce the the ability of models to generalize in a subjectivity detection task and
explore some reasons why this may be the case.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Exploratory Analysis</title>
      <p>Table 1 shows the number of samples and unique word counts for each of the datasets provided.
We see that Italian had the largest number of samples in training (1,613). However, Arabic had
the highest count of unique words (12,181), while German (4,622) and Dutch (3,944) had the
lowest. Assuming consistent data collection methodology and annotation standards across
languages, we would hypothesize that a larger quantity of unique words would yield
higheraccuracy models. The sample size of all languages in this task is relatively small compared to
the other tasks in the CheckThat Lab.</p>
      <p>As shown in Figure 1, all of the datasets provided by the CheckThat! organizers had label
bias which skewed each dataset towards sentences labeled as ’objective’.</p>
      <p>Transformer models utilize WordPiece tokenization schemes that are dependant on the model
being evaluated. At the time of pre-training, the WordPiece algorithm determines which pieces
of words will be retained, and which will be discarded. An Unknown (UNK) token is utilized as
a placeholder in the lexicon, and used to represent WordPiece tokens received in novel input
that did not get utilized at model creation.</p>
      <p>
        The proportion of out-of-vocabulary tokens are have been shown to inversely correlates to
overall accuracy [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], so we explore proportions of UNK in each dataset to ensure our models are
not excluding too many tokens from any language. We present our analysis in Table 2. Most
notably, Arabic training set has the highest WordPiece count of 43,601. Since the unknown token
rates are mostly negligible between all languages, we expect count and diversity of Wordpiece
would influence model performance the most. Unexpectedly, the RoBERTa tokenizers we used
did not return UNK tokens on any dataset provided by the CLEF CheckThat! organizers.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Transformer Architectures and Pre-Trained Models</title>
      <p>
        In this work, we utilize BERT and RoBERTa models. The Bidirectional Encoder Representation
Transformer (BERT) is a transformer-based architecture that was introduced in 2018 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. BERT
has had a substantial impact on the field of NLP, and achieved state of the art results on 11 NLP
benchmarks at the time of its release. RoBERTa, introduced by [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], modified various parts of
BERTs training process. These modifications include more training data, more pre-training
steps with bigger batches over more data, removing BERT’s Next Sentence Prediction, training
on longer sequences, and dynamically changing the masking pattern applied to the training
data [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        For the Arabic Dataset, we used lanwuwei/GigaBERT-v4-Arabic-and-English [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], which
was trained on a large-scale corpus (Arabic version of OSCAR, an Arabic Wikipedia dump,
and Gigaword) with ∼ 10B tokens. The model showing state-of-the-art zero-shot transfer
performance from English to Arabic on information extraction tasks. The Arabic model contains
a vocabulary of length ∼ 21,000 and ∼ 26,000 for English and Arabic respectively.
For English, we used roberta-large [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The English RoBERTa model contains 50,265 WordPieces.
For Turkish, German, and Italian, we used dbmdz/bert-base-turkish-cased [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ],
dbmdz/bert-basegerman-uncased [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], and dbmdz/bert-base-italian-xxl-uncased [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], respectively. The vocabulary
sizes of the Turkish, German, and Italian models are respectively 32,000, 31,102, and 32,102.
For Dutch, we used GroNLP/bert-base-dutch-cased [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], which has a vocabulary size of 30,073.
The foundation model for each language was selected based on models we have used in the
past. Recognizing that this was a problem that should not benefit from case signaling, we chose
the uncased variant for any new model.
      </p>
      <p>
        For experimentation and comparison to roberta-large, we also fine-tune the pre-trained model
on subjectivity/style classification task, cfl/bert-base-styleclassification-subjective-neutral [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
This BERT-based model has been fine-tuned on the Wiki Neutrality Corpus (WNC) - a parallel
corpus of 180,000 biased and neutralized sentence pairs along with contextual sentences and
metadata. The model can be used to classify text as subjectively biased vs. neutrally toned.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Method</title>
      <sec id="sec-4-1">
        <title>4.1. Data Augmentation</title>
        <p>
          For each language, augmentation and training were done via back-translation into the respective
language using AWS translation. We back-translated the minority class in each dataset, which
is always the subjective documents. We appended back-translated subjective documents to the
training set. In our 2021 experiment [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], we found that this form of augmentation resulted in
a significant increase in recall and F1-score for the positive class. We did not use any dataset
outside the one provided by the organizers for data augmentation.
        </p>
        <p>In this work, we fine-tune lanwuwei/GigaBERT-v4-Arabic-and-English at diferent levels of
data augmentation and compare performances on the gold test set provided by the organizer.</p>
        <p>Table 3 shows the BLEU score for each back-translation scheme. Table 4 show training sample
size before and after data augmentation and Table 5 shows the number of new tokens acquired
after back-translation for each language. The higher the score, the more consistent or similar
the translation to the original text. For Arabic and Italian, BLEU scores decrease as more pivot
languages are used for back-translation, as we would expect. As a perfect translation would not
provide variation in the training samples, and a low BLEU score may not provide consistent
variation, this may suggest there is a sweet spot to BLEU score in a NLP data augmentation
task to provide diverse word selection but consistent translations.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Classification</title>
        <p>For all BERT and RoBERTa models utilized across all languages, we added an additional
meanpooling layer and dropout layer on top of the model prior to the final classification layer. Adding
these additional layers has been shown to help prevent over-fitting while fine-tuning. We used
an Adam optimizer with a learning rate of 2 − 5 and an epsilon of 1.5 − 8. We use a binary
cross-entropy loss function, 4 epochs, and a batch size of 32.</p>
        <p>Arabic
Dutch
English
German
Italian
Turkish</p>
        <p>SUBJ 280
OBJ 905
SUBJ 311
OBJ 489
SUBJ 298
OBJ 532
SUBJ 308
OBJ 492
SUBJ 382
OBJ 1231
SUBJ 378</p>
        <p>OBJ 422</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>Table 6 and 7 contains all model performance on the test set provided by the organizers. We find
that our Arabic model has an accuracy of 0.800 with a weighted average F1-score of 0.816. Our
English model had an accuracy of 0.696 with a weighted average F1-score of 0.687. For Turkish,
we had an accuracy of 0.788 and a weighted average F1-score of 0.784. German received an
accuracy of 0.337 and an F1-score of 0.174. Italian had an accuracy of 0.689 and F1 of 0.706.
Finally, our Dutch model had an accuracy of 0.646 and a weighted F1-score of 0.618.</p>
      <p>Table 8 and 9 shows Arabic model’s performance on the gold test set with diferent level of
data augmentation.</p>
      <p>Unique tokens
in source</p>
      <p>Unique tokens
in MT</p>
      <p>New Tokens</p>
      <p>in MT</p>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>We observe that a specialized style-classification model outperformed the RoBERTa-large
model quite significantly as seen in Table 10 and 11. This is likely because for a subjectivity
classification task there is a heavy emphasis on vocabulary and terminology, which is a lacking
in the relatively small training set provided. The raw RoBERTa did not have enough training
vocabulary to outperform a specialized model. We also observe a diminishing return when
over augment with the Arabic training set. As mentioned before, vocabulary plays a key role
and augmenting with several pivot languages may have afected the data quality, potentially
removing keywords that determine subjectivity. Look at the example below of a document
labeled subjective after only one translation from Arabic to English:
"Are there any resolutions that the Security Council may issue to ensure that Egypt’s water
The second round of back-translation (Arabic &gt; English &gt; Spanish &gt; English) then
produces:
"Is there a resolution that the Security Council can issue to ensure that Egypt’s
water quota in the Nile River is not afected?"
And the third (Arabic &gt; English &gt; French &gt; English) produces:
"Are there resolutions that the Security Council could adopt to ensure that Egypt’s
share of water in the Nile is not afected?"
By the second or third translation, the tone of the statement has shifted towards much more
objective. This results in much lower model performance. We can see the results of these
experiments in Table 8.</p>
      <p>
        Due to extremely low sample size on the subjective class, we augmented Arabic and Italian
training data three times. Table 12 shows the average cosine similarity score between each
translation results to the original and the weighted average sentiment score of the pivoting
English back-translation based on the Vader Lexicon [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. For Arabic, there was no notable
diference between the scores. However, for Italian, cosine similarity shows small decreases as
more layers of back-translation are added, indicating a small level of semantic drift. Additionally,
mean sentiment score decreases indicating subjectivity-level of the lexicon decreases as well.
      </p>
      <p>Our paper suggests there may be a ’sweet spot’ in BLEU score for data agumentation for
back-translation, where a perfect translation would not add suficient noise to the training data
and a poor translation would not add suficient context. We would recommend exploration of
the BLEU score space as an optimization problem in future work.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>We have described the back-translation augmentation strategies and models employed by Team
Accenture’s submissions to Task 2. Team Accenture’s back-translation and foundation model
approach yielded the 3rd highest submissions in Arabic, 4th in Turkish, 5th in Dutch, and 8th
in German and English. In future work, we hope to explore in more detail to what extent
back-translation data augmentation can inhibit subjectivity detection systems.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Cambria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Poria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gelbukh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Thelwall</surname>
          </string-name>
          ,
          <article-title>Sentiment analysis is a big suitcase</article-title>
          ,
          <source>IEEE Intelligent Systems</source>
          <volume>32</volume>
          (
          <year>2017</year>
          )
          <fpage>74</fpage>
          -
          <lpage>80</lpage>
          . doi:
          <volume>10</volume>
          .1109/MIS.
          <year>2017</year>
          .
          <volume>4531228</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>I.</given-names>
            <surname>Chaturvedi</surname>
          </string-name>
          , E. Cambria,
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Welsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Herrera</surname>
          </string-name>
          ,
          <article-title>Distinguishing between facts and opinions for sentiment analysis: Survey and challenges</article-title>
          ,
          <source>Information Fusion</source>
          <volume>44</volume>
          (
          <year>2018</year>
          )
          <fpage>65</fpage>
          -
          <lpage>77</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L. L.</given-names>
            <surname>Vieira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L. M.</given-names>
            <surname>Jeronimo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. E.</given-names>
            <surname>Campelo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. B.</given-names>
            <surname>Marinho</surname>
          </string-name>
          ,
          <article-title>Analysis of the subjectivity level in fake news fragments</article-title>
          ,
          <source>in: Proceedings of the Brazilian Symposium on Multimedia and the Web</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>233</fpage>
          -
          <lpage>240</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Jeronimo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. B.</given-names>
            <surname>Marinho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. E.</given-names>
            <surname>Carmpelo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Veloso</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. S.</surname>
          </string-name>
          <article-title>da Costa Melo, Characterization of fake news based on subjectivity lexicons</article-title>
          .,
          <source>J. Data Intell</source>
          .
          <volume>1</volume>
          (
          <year>2020</year>
          )
          <fpage>419</fpage>
          -
          <lpage>441</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>P.</given-names>
            <surname>Kasnesis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Toumanidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. Z.</given-names>
            <surname>Patrikakis</surname>
          </string-name>
          ,
          <article-title>Combating fake news with transformers: A comparative analysis of stance detection and subjectivity analysis</article-title>
          ,
          <source>Information</source>
          <volume>12</volume>
          (
          <year>2021</year>
          )
          <fpage>409</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Galassi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ruggeri</surname>
          </string-name>
          , A. B.-C. no,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Caselli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kutlu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Struss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Antici</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hasanain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Köhler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Korre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Leistra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Muti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Siegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Turkmen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegand</surname>
          </string-name>
          , W. Zaghouani,
          <article-title>Overview of the CLEF-2023 CheckThat! lab task 2 on subjectivity in news articles</article-title>
          , in: Working Notes of CLEF 2023-
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , CLEF '
          <year>2023</year>
          , Thessaloniki, Greece,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rodrigues</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tran</surname>
          </string-name>
          , Accenture at CheckThat! 2021:
          <article-title>Interesting claim identification and ranking with contextually sensitive lexical training data augmentation</article-title>
          ,
          <year>2021</year>
          . arXiv:
          <volume>2107</volume>
          .
          <fpage>05684</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Aye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Learning autocompletion from real-world datasets</article-title>
          , in: 2021 IEEE/ACM 43rd International Conference on Software Engineering:
          <article-title>Software Engineering in Practice (ICSE-SEIP)</article-title>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>131</fpage>
          -
          <lpage>139</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , V. Stoyanov,
          <article-title>RoBERTa: A robustly optimized BERT pretraining approach</article-title>
          , CoRR abs/
          <year>1907</year>
          .11692 (
          <year>2019</year>
          ). URL: http://arxiv.org/abs/
          <year>1907</year>
          .11692. arXiv:
          <year>1907</year>
          .11692.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>W.</given-names>
            <surname>Lan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ritter</surname>
          </string-name>
          ,
          <article-title>An empirical study of pre-trained transformers for Arabic information extraction</article-title>
          , arXiv preprint arXiv:
          <year>2004</year>
          .
          <volume>14519</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Schweter</surname>
          </string-name>
          , BERTurk - BERT models for turkish,
          <year>2020</year>
          . URL: https://doi.org/10.5281/zenodo. 3770924. doi:
          <volume>10</volume>
          .5281/zenodo.3770924.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>B.</given-names>
            <surname>Chan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schweter</surname>
          </string-name>
          , T. Möller,
          <source>German's next language model</source>
          ,
          <year>2020</year>
          . arXiv:
          <year>2010</year>
          .10906.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Schweter</surname>
          </string-name>
          ,
          <string-name>
            <surname>Italian</surname>
            <given-names>BERT</given-names>
          </string-name>
          <source>and ELECTRA models</source>
          ,
          <year>2020</year>
          . URL: https://doi.org/10.5281/zenodo. 4263142. doi:
          <volume>10</volume>
          .5281/zenodo.4263142.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>W. de Vries</surname>
          </string-name>
          , A. van
          <string-name>
            <surname>Cranenburgh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Bisazza</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Caselli</surname>
            , G. van Noord,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Nissim</surname>
          </string-name>
          , Bertje: A Dutch BERT model,
          <year>2019</year>
          . arXiv:
          <year>1912</year>
          .09582.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sundararajan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Taly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <article-title>Axiomatic attribution for deep networks</article-title>
          ,
          <year>2017</year>
          . arXiv:
          <volume>1703</volume>
          .
          <fpage>01365</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>C.</given-names>
            <surname>Hutto</surname>
          </string-name>
          , E. Gilbert,
          <article-title>Vader: A parsimonious rule-based model for sentiment analysis of social media text</article-title>
          ,
          <source>Proceedings of the International AAAI Conference on Web and Social Media</source>
          <volume>8</volume>
          (
          <year>2014</year>
          )
          <fpage>216</fpage>
          -
          <lpage>225</lpage>
          . URL: https://ojs.aaai.org/index.php/ICWSM/article/view/14550. doi:
          <volume>10</volume>
          .1609/icwsm.v8i1.
          <fpage>14550</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>