1. Introduction

Exploring Data Augmentation for Classification of Climate Change Denial: Preliminary Study

Jakub Piskorsk

jpiskorski@gmail.com 2 4

NikolaosNikolaidi

2 5

NicolasStefanovitc

1 2

BonkaKotseva

0 2

IreneVianin

2 3

Sopho Kharaz

sopho.kharazi@ext.ec.europa.e 2 3

Jens P. Linge

jens.linge@ec.europa.eu 1 2

Workshop Proceedings

2 0 CRI , Luxembourg , Luxembourg 1 European Commission Joint Research Centre , Ispra , Italy 2 In: R. Campos, A. Jorge, A. Jatowt, S. Bhatia, M. Litvak (eds.): Proceedings of the Text2Story'22 Workshop , Stavanger 3 Piksel SRL , Ispra , Italy 4 Polish Academy of Sciences , Warsaw , Poland 5 Trasys International , Brussels , Belgium

2022

In order to address the growing need of monitoring climate-change denial narratives in online sources, NLP-based methods have the potential to automate this process. Here, we report on preliminary experiments of exploiting Data Augmentation techniques for improving climate change denial classification. We focus on a selection of both known techniques, and augmentation transformations not reported elsewhere that replace certain type of named entities with high probability of preserving labels. We also introduce a new benchmark dataset consisting of text snippets extracted from online news labeled with ifne-grained climate change denial types.

text classification climate change denial machine learning data augmentation

1. Introduction

(J. P. Linge) (specific name-entity type replacements) with a focus on transformations with high probability of preserving labels. The main drive behind this research is two-fold: First, it emerges from the need to rapidly develop a production-level component for CC denial text classification for Europe Media Monitor (EM M1,)a large-scale media monitoring platform used by EU institutions, and, secondly, from the scarcity of annotated data for the task at hand. The experiments reported in this paper build mainly on top of the only publicly available text corpus of CC contrarian claims, which is labeled using a fine-grained taxonomy presented1i]n. W[e also present a preliminary evaluation of a some models on a new EMM-derived news snippet corpus reusing the same taxonomy. The findings contained in this paper are not of general nature, but rather specific to the exploited data and the domain, paving the way for future in-depth explorations.

The paper starts with an overview of related work in Se2.ctNieoxnt, the DA techniques exploited in our study is described in Sect3i,ownhereas Section4 introduces a news-derived corpus of text snippets related to CC denial. The DA techniques performance evaluation is presented in Section5. Section6 provides detailed analysis of the behaviour of two specific named-entity replacement-based DA techniques. Finally, we present our conclusions in S7e.ction

2. Related Work

Only recently the CC debate has received more attention in the NLP community in the context of developing solutions for making sense of the vast amount of textual data produced on this topic [ 2 ]. A corpus of manually-tagged blog posts on CC in terms of scepticism and acceptance of CC is presented in3[]. In 2016 a SemEval task on stance detection of tweets, where ”CC is a real concern” was organize4d].[In [ 5 ] an annotated news corpus for stance toward ”climate change is a real concern” and related experiments are presented, wh6]erineatsro[duced a dataset for sentence-based climate change topic detection. Fi7n]arlelpyo, r[ted on a collection of tweets used to study the public discourse around CC.

To the best of our knowledge only two textual corpora with CC denial and disinformation labels exists, namely, the corpus of ca. 30K text paragraphs containing contrarian claims about climate change extracted from conservative think-tank websites and contrarian blogs (4C corpus) [ 1 ], and a collection of ca. 500 news articles with known CC misinformation scraped from web pages of CC counter movement organisatio8n].s G[iven that the latter corpus is not publicly accessible at the moment, we exploit the former 4C corpus, and the associated taxonomy of climate contrarianism in our study.

Data Augmentation (DA) is a family of techniques aiming at creation of additional training data in order to alleviate problems related to insuficient and imbalanced data and low data variability, with the overall goal of model performance improvement. Recently, DA has gained attention in the NLP domain, and a wide range of DA techniques has been elaborated and explored, including, i.a., simple word substitution, deletion, and inse9]r,tsiuobn-s[tructure substitution10[], back-translatio1n1[ , 12 ], contextual augmentatio1n3][, data noising1[ 4, 15 ], injection of noise into the embedding spa1c6e][, interpolating the vector representations of text and label1s7[], etc. A survey on DA techniques for text classification is presente1d8]i,n [ whereas [ 19 ] provides a more general overview of DA in the broader area of NLP .

3. Data Augmentation

For the sake of carrying out DA experiments we have selected a range of known and 2 variants of some known techniques, in particular, focusing on transformations with high probability of preserving labels by the automatically created instances. The list of DA techniques encompasses: COPY: simply creates copies of the existing instances in the training dataset. DATE: randomly changes all dates, e.g., month and day-of-the-week names.

DEL-ADJ-ADV: deletes up to a maximum of 1/3 of all adjectives and adverbs in the text, provided that they are preceded by nouns and verbs resp. Here, the assumption is that such transformation preserves the label assigned to the text.

PUNCT: inserts various punctuation marks randomly selected from (‘.’, ‘;’, ‘?’, ‘:’, ‘!’, ‘,’) into randomly selected positions in the text, where the number of insertions is a randomly selected number between 1 and 1/3 of the length of the text (in words). This simple DA technique introduced recently i1n5[] proved to outperform many other simple DA techniques. GEO: randomly replaces all occurrences of toponyms referring to a populated place with another randomly chosen toponym fromGeonames-based2 gazetteer of about 200K populated cities. PER-ORG: randomly replaces occurrences of mentions of person and organisation names matched using theJRC Name Variant database2[0] (containing large fraction of entities whose mentions appear in the news) with some other names therefrom (not spelling variants of the replaced names). The current versionJRoCf Name Variant contains circa 3 million names. SYN: randomly replaces verbs and adjectives with their synonyms. It picks the top-10 tokens (verbs/adjectives) whose deletion maximizes the cosine distance from the resulting sentence’s embedding to that of the original sentence and replaces them with semantically close words. For the first part, we exploit USE embeddin2g1[] and for the second, we approximate the semantic proximity of words with wikipedia pre-traiFnaesdtText embeddings [ 22 ]3.

SYN-REV: same process as above, but difers in picking the top-10 tokens whose deletion minimizes the cosine distance of the sentence’s embedding.

BACK-TRANSL: consists of translating the input text to some other language and then translating back the translation into Eng11li,s1h2][. Here, we translated to French, German and Polish and then back to English using an in-house NMT-based solu2t4i]o.n [ Some examples of the application of the DA techniques enumerated above are provided in Table8 in AnnexA. While most of these techniques were reported elsewhere, GEO and PERORG, i.e., replacement of specific types of named entities, to the best of our knowledge, were not explicitly explored. Based on empirical observations, the application of these transformations result in label preservation with high probability, although the transformed texts might appear ‘unrealistic’ due to random name replacement. Furthermore, since the replacement is based on a lexicon look-up, the transformation might result in replacing entities of other type by mistake, but, again, based on empirical observations, this does not have high impact on the label.

Additionally, we explored ways of combining the DA techniques enumerated above, incl.: (a)

2https://www.geonames.org/ 3We exploit thGeensim interface2[3]

ALL: combination the results of all the above DA techniques created separaAtLelLy-,K(bB):, a variant of ALL, but combining only DA techniques based on knowledge-based resources, i.e., PUNCT, DEL-ADJ-ADV, DATE, GEO and PER-ORG, (c)ALL-KB-STACKED: resulting from running the techniques used in ALL-KB in a pipeline (in the order as above) that modifies progressively the same input text, andB(EdS)T-3 combination of the 3 DA techniques, whose results were merged (not stacked), and which yield best gain in performance (see S5e).ction

4. EMM-derived CC denial text snippet corpus

In order to establish a benchmark corpus for the news domain and to test the classification performance, we relied on EMM. Articles taken from a limited set of news sources that disinformation experts had identified as frequently spreading misinformation. In order to limit the dataset to articles on CC, we queried for articles containing keywords related to the topic such as: ‘climate change’, ‘global warming’, ‘greenhouse gas[es]’, ‘greenhouse efect[s] ’ and limited the publication date to the whole of 2021. Out of these, a random subset of 2500 articles was sampled. For each article, we generated a snippet made of the title and of up to the first 500 characters. The corpus was manually annotated by five disinformation experts, using the Codebook defined in [ 1 ]. 1118 snippets were annotated, 42.7% of which are tagged with a class indicating a CC denial narrative, while the second half has been taggNeodcalasim, i.e, not containing any CC denial claim captured by the Codebook. In some snippets, while inflammatory language superficially similar to CC denial was used, the texts actually embrace polemical stance on CC inaction. When stance was ambiguous, the snippet was discarded, whereas the remaining snippets containing activists stance were assigned theNloacblaeilm.

The statistics of the current version of the c4oarpreusprovided in AnneAx in Table7.

5. Classification Experiments

We have experimented with two ML paradigms, namely: (a) linear SVM using the algorithm described in 2[ 5 ] andLiblinear librar5y, with 3-6 character n-grams as binary features, using vector normalization a n=d1.0 resulting from parameter optimization, and (b) RoBERTA architecture26[] using batch size=32, learning rate 1e-5 andclass weighting.

Prior to carrying out ML experiments we cleaned the original 4C c1o]rdpuues t[o some problems, i.a., (a) some entries were included in both training and test data, and often having diferent labels, and (b) some entries were corrupt, i.e., missing texts, non parseable content. We used this modified version of the 4C corpus, containing ca. 30 entries less. The 4C dataset is highly imbalanced, i.e., more than 60% of the instances are labeNleodcalasim, whereas 14 classes constitute ca. 1-2% of the entire dataset each (see6TinabAlnenexA for statistics.).

The results of the evaluation of SVM and RoB E RTaon the 4C corpus without any DA are presented in Tab1l,ewhere we explored SVM both with and without class weighting. The performance of the baseline RoBERTa is similar to the one of its counterpart report1e].d in [

As regards DA techniques, we have augmented all instances CofCa-ldlenial classes, whereas theNo claim class was not augmented. Each to-be-augmented instance was augm en∈ted {1, 2, 4} times and the experiments have been repeated 3 times. The gain/loss obtained for SVM4Please note that this corpus is ongoing active development and will be continuously extended. 5https://www.csie.ntu.edu.tw/~cjlin/liblinear and RoBERTa-based models for all DA techniques is reported in T2aabnlde3 resp., with the best results per measure and number of augmentations marked in bold. In all experiments all original training data was used as well. BEST-3 refers to a combination of 3 DA techniques, each run separately, which yield best gain in performance and were: (a) PUNCT, BACK-TRANSL, GEO for SVM, and (b) PUNCT, GEO, PER-ORG for weighted SVM and RoBERTa.

1 augmentation Accuracy macro 1 gain gain

2 augmentations Accuracy macro 1 gain gain

4 augmentations Accuracy macro 1

gain gain +0.6 (+2.1) -0.4 (+0.9) -0.1 (+1.1) +0.4 (+1.9) +0.1 (+2.0) +0.6 (+1.8) -0.3 (+0.7) -1.5 (-1.7) -0.8 (+2.0) +1.6 (+2.3) +0.4 (+2.5) -0.1 (+2.7) +1.4 (+3.8) +3.0 (+0.6) +0.3 (+0.1) +2.1 (+0.2) +3.4 (+0.2) +2.0 (+0.6) +1.4 (+0.4) +3.0 (+0.3) +2.4 (-0.1) +1.7 (+0.6) +4.0 (+1.0) +2.9 (+0.8) +4.3 (+0.8) +3.7 (+1.1) +0.8 (+1.8) -0.2 (+0.5) +0.3 (+1.2) +1.1 (+1.0) +0.5 (+2.1) +0.7 (+1.5) -0.2 (+0.4) -0.9 (-0.2) -0.3 (+2.1) +1.0 (+2.8) +0.9 (+2.5) +0.5 (+2.7) +1.4 (+3.8) +3.9 (+0.3) +0.5 (0.0) +3.1 (+0,4) +4.3 (+0.5) +3.0 (+0.9) +2.1 (+0.3) +3.9 (+0.4) +3.9 (0.0) +2.2 (+0.7) +4.6 (+0.6) +3.8 (+0.8) +4.6 (+0.7) +4.6 (+0,7) +0.9 (+1.7) -0.3 (+0,4) +0.3 (+1.1) +1.2 (+1.6) +0.6 (+2.7) +0.6 (+1.1) -0.6 (+0.1) -0.5 (-0.3) -0.1 (+2.4) 0.1 (+2.1) +0.7 (+2.6) +0.2 (+1.9) +0.9 (+3.1)

As regards weighted SVM, one can observe that overall highest gain in m1awcarsoobtained with the ALL-KB setting (+1.6) with a 1-per-instance augmentation, and BEST-3 obtained highest gain for 2 and 4 augmentations (1.4 and 0.9 resp.). PUNCT appears to be the best stand-alone DA technique with some gains above 1.0. Applying simple copying (COPY) beats many other DA techniques (macr o 1 improved by up to +0.9), although it is outperformed by the ones mentioned earlier. The two new DA techniques, i.e., GEO and PER-ORG yield positive gain in all set-ups, while the usage of DATE, SYN, SYN-REV and BACK-TRANSL in a stand-alone mode does not appear to be beneficial, i.e., close to zero gain or deterioration. The DA gains for unweighted SVM are higher, but since the best setting (BEST-3) for unweighted SVM case is worse than the weighted SVM baseline, we do not analyze it any further.

As regards the RoBER Ta -based models, one can observe that DA consistently deteriorates the accuracy on average, whereas for the most of the basic DA techniques there is little or 1 augmentation Accuracy macro 1 gain gain

2 augmentations Accuracy macro 1 gain gain

4 augmentations Accuracy macro 1

gain gain no again at all in terms of macr1owith BACK-TRANSL exhibiting the highest gain (+0.9), followed by PUNCT (+0.6). The composite DA techniques perform on average better, with highest gain of 0.7 for ALL, which is higher than when applying simple COPY (0.4). Such results are consistent with recent literature exploring data augmentation techniques with RoBERTa in the related field of propaganda techniques classificati2o7n].[

RoBERTa’s deterioration could be possibly explained by potential overfitting to the full sentence structure due to too similar sentences, given neural networks tendency t1o8]o.verfit [ However, we also observe that this phenomenon diminishes with more augmentations. While DATE should have the least impact on the label, it showed the most important and consistent drop in performances. A better understanding of this behaviour requires further investigation.

Interestingly, we have observed that PUNCT, SYN, and SYN-REV were the three basic DA techniques with highest variance (up to ca. 1.0 diference in the gain for m a1carcoross diferent experiments) and the same could be observed for the composite methods that do include these basic DA methods. In particular, given that the simple PUNCT method performs overall best across the diferent settings one could explore in future potential improvements that could be gained through some tuning, e.g., limiting the positions in which punctuation signs are inserted and/or studying whose punctuation sign insertion results in higher gains in performance.

We have applied the baseline and some DA-boosted models on the EMM-derived corpus described in Section4, whose performance is summarized in Tabl4e. The deterioration in performance vis-a-vis 4C corpus evaluation could be mainly due to the diferent nature of the EMM corpus (text structure and writing style). Noteworthy, the evaluation on the EMM dataset revealed that the models trained using DA consistently outperform the baseline models. As regards the RoBERTa-based data-augmented models the gain ranges from -0.7 to +2.8 and -0.7 to +4.6 in accuracy and macr o1 scores, respectively, with the vast majority being positive. The boost is the result of higher recall in the DA trained models. For the sake of completeness, the confusion matrix for the RoBE R Ta model boosted with BACK-TRANSL augmentation (reported in Tabl4e) is provided in Figure1 AnnexA

6. Data Augmentation Impact on Reducing the Bias

In order to better understand the behavior of the DA techniques relying on proper name replacement, namely GEO and PER-ORG, we performed additional experiments with alternate versions, and analysing the distribution of names entities. This is motivated by the finding that texts containing disinformation are often very specific about the entities involved. These alternate techniques are characterized by a diferent sampling strategy of the entities to be inserted. In contrast to the GEO and PER-ORG experiments, the replaced named entities are not taken from a larger pool of entities, but instead, are taken from the pool of the entities that are detected in the texts. We respectively define the additional expeGriEmOe-nStPs and PER-ORG-SP which correspond to the GEO and PER-ORG experiments using this modified sampling on the CC-denial classes onGlyE;O-SP-ALL and PER-ORG-SP-ALL, where this randomization procedure is applied to the CC-denial classes as well aNs otoclatihme class; and finally GEO-SP-STRICT andPER-ORG-STRICT, where the instances of all classes are perturbed and only perturbed data is used. These experiments were only performed with weighted SVM, using only one augmentation. We report the results in5.TWabelaelso compare the augmented dataset and the original dataset using the Jensen-Shannon (JS) divergence on two distributions: (a) of the replaced entities, and (b) the labels associated with these entities.

In the GEO and PER-ORG experiments, the entities in the instances of the CC-denial classes were replaced with entities drawn from a much larger pool, practically removing these original entities from the augmented data. The clearly lower performance of the *-STRICT experiment, notably in terms of macro1 seems to indicate that some classes rely heavily on the presence of certain entities in order to be correctly predicted. This experiment is the only one not containing the original data at all, and the distribution of replaced entities diverges the most from the original dataset. Most of the errors are due to CC-denial texts being preNdoicctlaeidma, swith the classes 4_* having the most issues, this is coherent as these classes are the most linked to policies, and therefore to the corresponding actors.

The *-SP experiments, where only CC-denial classes get augmented, show a small increase in performances. The increase in performance is notable in the *-SP-ALL experiments, where theNo claim class also gets augmented. The distribution of entities diverges more than in the case of *-SP, but the distribution of labels associated with these entities diverges less. The combination of both the original dataset and the fully transformed one seems to yield the best compromise between generalization and fitting to particular entities in the test dataset. Exploring this interplay is an interesting direction for future works. Randomly swapping named entities could change an actual disinformation claim into factual information or vice versa. It is out of the scope of the classifier to deal with fact checking, however, it is important to reckon the competing interest between a classifier that generalises well to unseen claims on new entities and better fitting to the known narratives.

The *-SP experiments exhibit a performance on par or lower than the their equivalents without their characteristic sampling. For GEO-SP there is a clear performance gap with respect to GEO in terms of macro1. The reason why the divergence of GEO appears lower than GEO-SP is because it does not take into account newly introduced entities in GEO. Overall, both GEO, which introduces new entities, and GEO-SP, which changes the distribution of labels associated with existing entities, tend to improve the m1aacnrod accuracy.

7. Conclusions

We reported on preliminary experiments of using DA techniques for improving climate change denial classification. The evaluation on the 4C corpus yielded a boost with data augmentation up to 1.6 and 0.9 gain in macr o1 for SVM- and RoBERTa-based classifiers resp. For the vast majority of the DA techniques respective SVM-based models resulted in gain, whereas for most of the RoBERTa-based models a loss was observed. Analysing the new EMM-derived test dataset introduced in this paper with ca. 1K snippets, DA techniques lead to up to 4.6 point gains in macro 1 vis-a-vis baseline model. The overall performance is nevertheless worse than on the 4C corpus, which was expected due to the diferent nature of the sources considered.

We provided a more in-depth analysis of the behaviour of two DA techniques not reported earlier, which randomly replace toponyms and person/organisation names, and which were among the ones that resulted in higher gains in m a1crfoor SVM-based models.

We believe the reported findings will boost the NLP research in the climate change domain. We also make the cleaned version of the 4C and the new EMM-derived corpus publicly acc6e.ssible

6https://github.com/jpiskorski/CC-denial-resources A. Supplementary Information

The statistics for the 4C (Contrarian Claims about Climate Change) and the news-derived text snippet corpus are presented in Ta6blaend 7 resp. Please note that both datasets cover only a fraction of types (18 out of 27) of the CC contrarian claim taxo1n]o.my [

In Istanbul, the snow could easily reach up to 30 cm in April, Mayor Kadir Topba announced.

In Istanbul, the snow could reach up to 30 cm in June, Mayor Kadir Topba announced.

In Istanbul, the snow; could easily reach up to? 30 cm in June, Mayor Kadir Topba: announced.

In Porto Alegre, the snow could easily reach up to 30 cm in June, Mayor Kadir Topba announced.

In Istanbul, the snow could easily reach up to 30 cm in June, Mayor Stephen King announced.

In Istanbul, the snow could easily be up to 30 cm in June, Mayor Kadir Topba said.

In Istanbul, Mayor Kadir Topba announced that the snow could easily be up to 30 cm high in June. model trained using augmented data with

[1]

T. G.

Coan ,

Boussalis ,

Cook ,

M. O.

Nanko , Computer-assisted classification of contrarian claims about climate change , Scientific Reports 11 ( 2021 ).

[2]

Stede ,

Patz , The climate change debate and natural language processing , in: Proceedings of the 1st Workshop on NLP for Positive Impact , Association for Computational Linguistics, Online, 2021 , pp. 8 - 18 .

[3]

Diakopoulos ,

A. X.

Zhang ,

Elgesem ,

Salway , Identifying and analyzing moral evaluation frames in climate change blog discourse ., in: Proceedings of the Eighth International AAAI Conference on Weblogs and Social Media , 2014 , pp. 583 -- 586 .

[4]

Mohammad ,

Kiritchenko ,

Sobhani ,

Zhu , C. Cherry, SemEval -2016 task 6: Detecting stance in tweets , in: Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) , Association for Computational Linguistics , San Diego, California, 2016 , pp. 31 - 41 .

[5]

Luo , ,

Card ,

Jurafsky , Detecting stance in media on global warming , in: Findings of the Association for Computational Linguistics: EMNLP 2020 , 2020 .

[6]

F. S.

Varini ,

J. L.

Boyd-Graber ,

Ciaramita ,

Leippold , Climatext: A dataset for climate change topic detection , 2020 .

[7]

Al-Rawi ,

OʼKeefe ,

Kane ,

A.-J.

Bizimana , Twitter's fake news discourses around climate change and global warming, Frontiers in Communication 6 ( 2021 ).

[8]

Bhatia ,

J. H.

Lau , T. Baldwin, You are right. I am ALARMED - but by climate change counter movement , CoRR ( 2020 ) a .rXiv: 2004 .14907.

[9]

Wei ,

Zou , EDA: Easy data augmentation techniques for boosting performance on text classification tasks , in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) , Association for Computational Linguistics , Hong Kong, China, 2019 , pp. 6382 - 6388 .

[10]

Shi ,

Livescu ,

Gimpel , Substructure substitution: Structured data augmentation for NLP, in: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 , Association for Computational Linguistics , Online, 2021 , pp. 3494 - 3508 .

[11]

Sennrich ,

Haddow ,

Birch , Improving neural machine translation models with monolingual data, in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1 : Long

Papers)

, Association for Computational Linguistics , Berlin, Germany, 2016 , pp. 86 - 96 .

[12]

A. W.

Yu ,

Dohan , M.-T. Luong,

Zhao ,

Chen ,

Norouzi ,

Q. V.

Le , Qanet: Combining local convolution with global self-attention for reading comprehension ., CoRR abs/ 1804 .09541 ( 2018 ).

[13]

Kobayashi , Contextual augmentation: Data augmentation by words with paradigmatic relations, in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Volume 2 ( Short

Papers)

, Association for Computational Linguistics , New Orleans, Louisiana, 2018 , pp. 452 - 457 .

[14]

Xie ,

S. I.

Wang ,

Li ,

Lévy ,

Nie ,

Jurafsky ,

A. Y.

Ng , Data noising as smoothing in neural network language models , in: 5th International Conference on Learning Representations, ICLR 2017 , Toulon, France, April 24-26 , 2017 , Conference Track Proceedings, OpenReview.net, 2017 .

[15]

Karimi ,

Rossi ,

Prati , AEDA: An easier data augmentation technique for text classification , in: Findings of the Association for Computational Linguistics: EMNLP 2021 , Association for Computational Linguistics , Punta Cana, Dominican Republic, 2021 , pp. 2748 - 2754 .

[16]

Karimi ,

Rossi ,

Prati , Adversarial training for aspect-based sentiment analysis with bert , in: 2020 25th International Conference on Pattern Recognition (ICPR) , 2021 , pp. 8797 - 8803 .

[17]

Zhang ,

Cissé ,

Y. N.

Dauphin , D. Lopez-Paz, mixup: Beyond empirical risk minimization , CoRR abs/1710 .09412 ( 2017 ).

[18]

Bayer ,

Kaufhold ,

Reuter , A survey on data augmentation for text classification , CoRR abs/2107 .03158 ( 2021 ). URL:https://arxiv.org/abs/2107.031.5a8rXiv: 2107 . 03158 .

[19]

S. Y.

Feng ,

Gangal ,

Wei ,

Chandar ,

Vosoughi ,

Mitamura ,

Hovy , A survey of data augmentation approaches for NLP, in: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 , Association for Computational Linguistics , Online, 2021 , pp. 968 - 988 .

[20]

Ehrmann , G. Jacquet,

Steinberger , Jrc-names: Multilingual entity name variants and titles as linked data , Semantic Web 8 ( 2017 ) 283 - 295 .

[21]

Cer ,

Yang , S.-y. Kong,

Hua ,

Limtiaco ,

St. John , N. Constant, M. GuajardoCespedes, S. Yuan,

Tar ,

Strope ,

Kurzweil , Universal sentence encoder for English , in: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics , Brussels, Belgium, 2018 , pp. 169 - 174 . URL: https://aclanthology.org/D18-2.0d2o9i: 10 .18653/v1/ D18 -2029.

[22]

Mikolov , E. Grave,

Bojanowski ,

Puhrsch ,

Joulin , Advances in pre-training distributed word representations , in: Proceedings of the International Conference on Language Resources and Evaluation (LREC 2018 ), 2018 .

[23]

Řehůřek ,

Sojka , Software Framework for Topic Modelling with Large Corpora , in: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks , ELRA , Valletta, Malta, 2010 , pp. 45 - h5t0t .p://is.muni.cz/publication/884893/.en

[24]

Oravecz ,

Bontcheva ,

Kolovratník ,

Bhaskar ,

Jellinghaus , A. Eisele, etranslation's submissions to the WMT 2021 news translation task , in: L. Barrault , O.

Bojar , F.

Bougares , R.

Chatterjee , M. R.

Costa-jussà , C. Federmann, M.

Fishel , A.

Fraser , M.

Freitag , Y.

Graham , R.

Grundkiewicz , P.

Guzman , B.

Haddow , M.

Huck , A.

Jimeno-Yepes , P.

Koehn , T.

Kocmi , A.

Martins , M.

Morishita , C. Monz (Eds.), Proceedings of the Sixth Conference on Machine Translation, WMT@EMNLP 2021 ,

Online

Event , November 10-11 , 2021 , Association for Computational Linguistics, 2021 , pp. 172 - 179 .

[25]

Crammer ,

Singer , On the learnability and design of output codes for multiclass problems , in: Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, COLT ' 00 , Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2000 , p. 35 - 46 .

[26]

Liu ,

Ott ,

Goyal ,

Du ,

Joshi ,

Chen ,

Levy ,

Lewis ,

Zettlemoyer ,

Stoyanov , Roberta: A robustly optimized BERT pretraining approach , CoRR abs/ 1907 .11692 ( 2019 ). URL: http://arxiv.org/abs/ 1907 .116.9a2rXiv: 1907 .11692.

[27]

Gupta ,

Sharma , Nlpiitr at semeval -2021 task 6: Roberta model with data augmentation for persuasion techniques detection , in: Proceedings of the 15th International Workshop on Semantic Evaluation (SemEval-2021) , 2021 , pp. 1061 - 1067 .