GuillemGSubies at IDPT2021: Identifying Irony
         in Portuguese with BERT

                                Guillem Garcı́a Subies1

Instituto de Ingenierı́a del Conocimiento, Francisco Tomás y Valiente st., 11 EPS, B
            Building, 5th floor UAM Cantoblanco. 28049 Madrid, Spain
                             guillem.garcia@iic.uam.es


        Abstract. This paper describes a system created for the IDPT 2021
        shared task, framed within the IberLEF 2021 workshop. We present an
        approach mainly based on fine-tuned BERT models using a Grid-Search
        and Data Augmentation with MLM substitution. Our models far out-
        perform the baselines and achieve results close to to the state-of-the-art.

        Keywords: Irony Detection · BERT · Transformers · Data Augmenta-
        tion · BERTimbau


1     Introduction
Although irony can be relatively easy to identify for humans, it is not so easy to
detect for NLP models [5], mainly because the information can be implicit and
usually doesn’t use the literal meaning of the words used. This makes the task
of irony detection prefect to evaluate the evolution of NLP systems.
    The IDPT (Irony Detection in Portuguese) shared task proposes, during this
third edition of the IberLEF [15] workshop, a corpus to detect irony in tweets
and news written in Portuguese [1]. This article summarizes our participation in
all the IDPT tasks.
    Given the success of Transformer-inspired language models [23], both in
academia and industry [24], we decided to use already pre-trained BERT [6]
models. Furthermore, their ability to understand contextual information can be
very useful for the irony detection task. Specifically, we will use BERTimbau
[21] with hyperparameters Grid-Search. To address the problem of small data,
we will use Data Augmentation techniques.

1.1    Task Description
There are two corpora, one for tweets and one for news (task1 and task2 respec-
tively). For both of them, the problem is binary classification, where the sample
can be ironic or not.
    IberLEF 2021, September 2021, Málaga, Spain.
    Copyright © 2021 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0).
    The tweets corpus has 15212 tweets and the news one has 18494 news for
their train splits. The data is collected from various preexisting sources [8] [4]
[14]. Then, the test splits are composed of 300 tweets and 300 news gathered
and annotated by the organizers of the tasks. This will help to create models
that generalize very well.
    The metric used to evaluate the results is the balanced accuracy. This is
mainly because both datasets are very unbalanced as we can see in the table
1. It is also notable the difference between tasks; most of the tweets are ironic
while most of the news are not ironic.


               Tweets                     News
               Class      Nº of samples Class        Nº of samples
               ironic     12736           ironic     7222
               not ironic 2476            not ironic 11272
                          Table 1. Distribution of Samples


1.2   Goals

This work is focused on proving that it is possible to use open source resources
and relatively small language models (compared to the newest models like GPT-
3 [2]) to obtain state-of-the-art results. Specifically, the main goal is to obtain a
Portuguese language model that can detect the irony in the text and meet the
requirements explained before.


1.3   Summary of the proposal

To achieve the goals explained above we will fine-tune a BERTimbau model [21]
with Grid-search optimized parameters. Along with this model, the data is first
preprocessed with simple heuristics and then, augmented with Masked Language
Model word masking.
   In the next section, we will briefly see some previous work related to this
topic. Then, in Section 3, we will explain the main ideas behind the proposed
models and the experiments we did. In Section 4 we will present a summary of
the results we got. Finally, in Section 5 we will expose the main conclusions of
our work and results, and we will also propose some ideas for future work.


2     Related Work

There is an extensive bibliography on Sentiment Analysis and irony detection in
social media given the high scientific interest in solving such a difficult problem.
Some early attempts to create corpora in this field were for sarcasm detection,
for instance Davidov et al. used Amazon Mechanical Turk to create a corpus
with 5.9 million tweets [5] and Riloff et al. explore the identification of sarcastic
tweets that have a positive word or comment followed by an undesirable situation
[20].
    There have also been some attempts to create irony detection corpora in
other languages than English. For example, Ptacek et al. [19] created a Czech
sarcasm binary classification dataset for tweets and also propose a n-gram and
heuristics based embeddings that are feed intro classic machine learning models.
Liebrecht et al. [12] collect a Dutch sarcasm dataset from tweets that included
the hashtag #sarcasm and hypothesize about that hashtag being the digital
equivalent of non-verbal expressions in live interactions. Bilal et al. [10] collect
irony detection datasets in different languages in order to show that good models
can be trained even when the data for some language is scarce.
    Following this trend, there have been a lot of irony detection competitions
these last years. For instance, IDAT [9] proposed a binary classification problem
to detect irony in tweets written in Arabic. The best model was a feature based
one with classic machine learning models, outperforming even BERT models.
IroSvA, [16], proposed a binary irony classification problem for Spanish tweets in
different Spanish dialects. This time, the best model fed a Word2Vec embeddings
into a Transformer model as a weights initialization.
    For the Portuguese language, Carvalho et al. [3] detect irony in newspaper
comments using simple glossaries, proving that complex linguistic features do
not work for irony. Following the same trend Freitas et al. [7] create a list of
relevant patterns to detect irony in Portuguese tweets.
    It is notable that some of the models used in these works still use linguistic
features and heuristics to detect the irony. However, we will focus on the potential
of language models to solve this task without any linguistic features.

3     Models
3.1   Data Preprocessing
We performed a simple preprocessing where we substituted some expressions
with a more normalized form:
 – Every URL was replaced with the token “[URL]”, so we don’t get strange
   tokens when the tokenizer tries to process a URL. Furthermore, no semantic
   information about irony can be inferred from a URL, the only information
   relevant for the model is that there is a URL in that token.
 – The hashtag characters (“#”) were deleted (“#example” → “example”) be-
   cause the base language models we will use, are trained in generic text and
   might not understand their meaning. Furthermore, most of the hashtags are
   used the same way as normal words.
 – We replaced every username with the generic token “[USER]” because the
   exact name of a user does not really add any information about the irony.
   The only relevant feature is knowing if someone was mentioned or not, but
   not who.
 – Finally, we normalized every laugh (“jasjajajajj” → “haha”), so we minimize
   the noise of the misspellings, common in social networks.
3.2   Baselines
We created some baselines, so we can compare our models properly. We selected
a HashingVectorizer + RandomForest. This way, we can compare our models to
a classic feature extraction model.

3.3   Language Models
We used BERTimbau [21], a Portuguese BERT model that outperforms mBERT
and the previous state-of-the-art. Specifically, we used the large model, bert-
large-portuguese-cased. For the fine-tuning process, we carried out a Grid-search
optimization over the main parameters of the neural network: learning rate,
batch size and dropout rate. The search was performed with a 5-fold stratified
cross-validation with the following grid: Learning rate, (1e − 6, 1e − 5, 3e − 5, 5e −
5, 1e − 4); batch size, (8, 16, 32) and dropout rate, (0.08, 0.1, 0.12). The best
parameters for both models were: learning rate, 1e−5; batch size, 16 and dropout
rate, 0.1.

3.4   Data Augmentation
As the dataset is relatively small, we decided to run Data Augmentation tech-
niques. The selected strategy was the Data Augmentation through the masking
of words with a Masked Language Model, BERTimbau. For every sample in the
dataset, we randomly masked 15% of the tokens and used BERTimbau to pre-
dict them, creating a modified sample. With this method, we obtained double
the amount of the original samples.


4     Experiments and Results
4.1   Experimental Setup
The software we used was Python3.8, transformers 4.5.1 [24], pytorch 1.8.1 [17],
scikit-learn 0.24.1 [18] and nlpaug 1.1.3 [13].

4.2   Results
In the Table 2 we can see the results for our models in the test set of the first
task. Our runs for this task are BERTimbau and BERTimbau-aug, without data
augmentation and with data augmentation, respectively as explained in Section
3.3.
    We can see that the language model far outperforms classic methods like
hashing tricks and a random forest. We can also see that, although the Data
Augmentation does not provide a great performance boost, it is still useful in
order to have better models. All in all, our models obtain great results given
their simplicity, proving that finding the right parameters for the model is crucial
for optimizing the performance. These results are placed fourth among all the
participating teams, which proves that our approach, given it’s simplicity and
the lack any linguistic analysis, is very good.


                             Model            bacc
                             HV+RF            0.3316
                             BERTimbau        0.4912
                             BERTimbau-aug 0.5000
                            Table 2. Results for task1


   For the second task, the results were not as good as the ones obtained in
the first task. In the Table 3 we can look at them in more detail. It looks like
BERTimbau did not behave so well with the news dataset.


                             Model            bacc
                             HV+RF            0.5423
                             BERTimbau        0.7804
                             BERTimbau-aug 0.7858
                            Table 3. Results for task2


5   Conclusions and Future Work
Through this shared task, we have seen that NLP can be of great help in de-
tecting irony from natural language in social networks and there is still a long
way to go. The results obtained by our systems are very promising given their
great performance and their simplicity. This compilation of methods is very sig-
nificant because it could lead to much better results when combined with other
improvements from the state-of-the-art. Particularly, the Data Augmentation
approach with the Grid-search have proven to work really well in this context.
We therefore consider that we have achieved our goals for this shared task.
    However, we believe that our results could improve a lot using specific lan-
guage models trained only with corpora from social networks. Another interest-
ing approach would be to use a general language model and further pre-train
it with corpora from the same domain [22] as the final task. Finally, we have
proven that good hyperparameters are also key for a good neural network, so
a better search, like the Population Based Training [11], would further improve
the model.


Acknowledgments
This work has been partially funded by the Instituto de Ingenierı́a del Conocimiento
(IIC) and the hardware used was also provided by the IIC.
References
 1. Brisolara Corrêa, U., Pereira dos Santos, L., Coelho, L., A. de Freitas, L.: Overview
    of the idpt task on irony detection in portuguese at iberlef 2021. Procesamiento
    del Lenguaje Natural 67 (2021)
 2. Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Nee-
    lakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A.,
    Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Win-
    ter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark,
    J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language
    models are few-shot learners (2020)
 3. Carvalho, P., Sarmento, L., Silva, M.J., De Oliveira, E.: Clues for detecting irony
    in user-generated contents: oh...!! it’s” so easy”;-. In: Proceedings of the 1st inter-
    national CIKM workshop on Topic-sentiment analysis for mass opinion. pp. 53–56
    (2009)
 4. DA SILVA, F.R.A.: Detecção de ironia e sarcasmo em lı́ngua portuguesa: Uma
    abordagem utilizando deep learning (2018)
 5. Davidov, D., Tsur, O., Rappoport, A.: Semi-supervised recognition of sarcastic
    sentences in twitter and amazon. In: Proceedings of the Fourteenth Conference on
    Computational Natural Language Learning. p. 107–116. CoNLL ’10, Association
    for Computational Linguistics, USA (2010)
 6. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidi-
    rectional transformers for language understanding (2018)
 7. Freitas, L., Vanin, A., Vieira, R., Bochernitsan, M.: Some clues on irony detection
    in tweets. In: WWW 2013 Companion - Proceedings of the 22nd International Con-
    ference on World Wide Web (05 2013). https://doi.org/10.1145/2487788.2488012
 8. de Freitas, L.A., Vanin, A.A., Hogetop, D.N., Bochernitsan, M.N., Vieira, R.: Path-
    ways for irony detection in tweets. In: Proceedings of the 29th Annual ACM Sym-
    posium on Applied Computing. pp. 628–633 (2014)
 9. Ghanem, B., Karoui, J., Benamara, F., Moriceau, V., Rosso, P.: Idat at fire2019:
    Overview of the track on irony detection in arabic tweets. In: Proceedings of the
    11th Forum for Information Retrieval Evaluation. pp. 10–13 (2019)
10. Ghanem, B., Karoui, J., Benamara, F., Rosso, P., Moriceau, V.: Irony detection
    in a multilingual context. In: Jose, J.M., Yilmaz, E., Magalhães, J., Castells, P.,
    Ferro, N., Silva, M.J., Martins, F. (eds.) Advances in Information Retrieval. pp.
    141–149. Springer International Publishing, Cham (2020)
11. Jaderberg, M., Dalibard, V., Osindero, S., Czarnecki, W.M., Donahue, J., Razavi,
    A., Vinyals, O., Green, T., Dunning, I., Simonyan, K., Fernando, C., Kavukcuoglu,
    K.: Population based training of neural networks (2017)
12. Liebrecht, C., Kunneman, F., van den Bosch, A.: The perfect solution for de-
    tecting sarcasm in tweets #not. In: Proceedings of the 4th Workshop on Com-
    putational Approaches to Subjectivity, Sentiment and Social Media Analysis. pp.
    29–37. Association for Computational Linguistics, Atlanta, Georgia (Jun 2013),
    https://www.aclweb.org/anthology/W13-1605
13. Ma, E.: Nlp augmentation. https://github.com/makcedward/nlpaug (2019)
14. Marten, G.S., de Freitas, L.A.: The construction of a corpus for detecting irony
    and sarcasm in portuguese. Brazilian Journal of Development 7(5), 47973–47984
    (2021)
15. Montes, M., Rosso, P., Gonzalo, J., Aragón, E., Agerri, R., Ángel Álvarez Carmona,
    M., Álvarez Mellado, E., de Albornoz, J.C., Chiruzzo, L., Freitas, L., Adorno, H.G.,
    Gutiérrez, Y., Zafra, S.M.J., Lima, S., de Arco, F.M.P., Taulé, M.: Proceedings of
    the Iberian Languages Evaluation Forum (IberLEF 2021). In: CEUR Workshop
    Proceedings (2021)
16. Ortega-Bueno, R., Rangel, F., Hernández Farıas, D., Rosso, P., Montes-y Gómez,
    M., Medina Pagola, J.E.: Overview of the task on irony detection in spanish vari-
    ants. In: Proceedings of the Iberian languages evaluation forum (IberLEF 2019),
    co-located with 34th conference of the Spanish Society for natural language pro-
    cessing (SEPLN 2019). CEUR-WS. org. vol. 2421, pp. 229–256 (2019)
17. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T.,
    Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito,
    Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., Chin-
    tala, S.: Pytorch: An imperative style, high-performance deep learning library. In:
    Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., Garnett,
    R. (eds.) Advances in Neural Information Processing Systems 32, pp. 8024–8035.
    Curran Associates, Inc. (2019), http://papers.neurips.cc/paper/9015-pytorch-an-
    imperative-style-high-performance-deep-learning-library.pdf
18. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
    Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A.,
    Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine
    learning in Python. Journal of Machine Learning Research 12, 2825–2830 (2011)
19. Ptáček, T., Habernal, I., Hong, J.: Sarcasm detection on Czech and English
    Twitter. In: Proceedings of COLING 2014, the 25th International Conference on
    Computational Linguistics: Technical Papers. pp. 213–223. Dublin City Univer-
    sity and Association for Computational Linguistics, Dublin, Ireland (Aug 2014),
    https://www.aclweb.org/anthology/C14-1022
20. Riloff, E., Qadir, A., Surve, P., De Silva, L., Gilbert, N., Huang, R.: Sarcasm as
    contrast between a positive sentiment and negative situation. In: Proceedings of
    the 2013 Conference on Empirical Methods in Natural Language Processing. pp.
    704–714. Association for Computational Linguistics, Seattle, Washington, USA
    (Oct 2013), https://www.aclweb.org/anthology/D13-1066
21. Souza, F., Nogueira, R., Lotufo, R.: BERTimbau: pretrained BERT models
    for Brazilian Portuguese. In: 9th Brazilian Conference on Intelligent Systems,
    BRACIS, Rio Grande do Sul, Brazil, October 20-23 (to appear) (2020)
22. Sun, C., Qiu, X., Xu, Y., Huang, X.: How to fine-tune bert for text classification?
    (2020)
23. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser,
    L., Polosukhin, I.: Attention is all you need (2017)
24. Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P.,
    Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C.,
    Jernite, Y., Plu, J., Xu, C., Scao, T.L., Gugger, S., Drame, M., Lhoest, Q., Rush,
    A.M.: Transformers: State-of-the-art natural language processing. In: Proceedings
    of the 2020 Conference on Empirical Methods in Natural Language Processing:
    System Demonstrations. pp. 38–45. Association for Computational Linguistics,
    Online (Oct 2020), https://www.aclweb.org/anthology/2020.emnlp-demos.6