NLP4SM: Natural Language Processing for social media Gonzalo Medina Medina1, Jose Camacho Collados2 and Eugenio Martínez Cámara1 1 Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), University of Granada, Spain 2 School of Computer Science and Informatics, Cardiff University, United Kingdom Abstract NLP4SM is a website for the execution, analysis and comparison of tweet classification methods based on language models. Currently, NLP4SM supports the text classification tasks considered in TweetEval, but it aims at integrating additional text clasification tasks and to wider the number of language models available with the goal of becoming to a benchmark platform for assessing text classification methods with real data from social media. Keywords Language models, text classification, social media. 1. Introduction of a message. The potential of language models has made them The most likely source of the vertiginous progress of the baseline of a wide range of NLP tasks, and they Natural Language Processing (NLP) in the recent can even be used for developing learning models in years is the proposal of the Word2Vec model [1], production environments. On the other hand, the which eases the generation of unsupervised linguis- ease of tuning these models to specific NLP tasks has tic features that are known as word embeddings led the development and release of a huge amount and they represent the meaning of words in vec- of pre-trained language models in a large bunch tors of real numbers. The strong results reached of NLP task, with HuggingFace and especially its by word embeddings based on Word2Vect enhanced Transformers library [4] standing out. This vast the design of new word embeddings models, such variety of language models makes their comparison as Glove.1 These models set an embedding vec- and analysis really difficult as a previous step of the tor to each word regardless of its context, and for particular language model to fine-tune to a specific this reason the next landmark were starred by the use case. contextual word embeddings models [2]. The trans- The certain use of language in social networks formers models stand out as contextual word em- makes to adapt the NLP methods to the specific use beddings, with BERT [3] as outstanding example. of language of each social network, as for instance These models are known as language models, and to Twitter [5]. Language models also needs this their capacity of representing the meaning of words fitting to the use of language of social networks, couple with the possibility of using them as pre- which makes them to be at the top of most NLP trained models have driven the progress of a broad shared-tasks. branch of NLP tasks, especially those mostly linked The great availability of language models has to the classification of the semantic meaning of text, not been coupled with the release of web platforms such as the opinion polarity of a review, the offen- for comparing and analysing the different language sive meaning or the underlying emotional meaning models in specific NLP tasks. Nevertheless, the issue of the great availability of training corpora SEPLN-PD 2022. Annual Conference of the Spanish and the evaluation of learning models begins to Association for Natural Language Processing 2022: be resolved by the publication of leader boards of Projects and Demonstrations, September 21-23, 2022, A learning models trained on gold standards, such as Coruña, Spain SuperGLUE [6] or TweetEval [7]. $ gmedina95@correo.ugr.es (G. Medina Medina); camachocolladosj@cardiff.ac.uk (J. Collados); Following the example of the NLP classification emcamara@decsai.ugr.es (E. Martínez Cámara) tasks leader boards, we present the web platform  0000-0003-1618-7239 (J. Collados); 0000-0002-5279-8355 NLP4SM,2,3 whose demonstrative prototype is de- (E. Martínez Cámara) © 2022 Copyright for this paper by its authors. Use scribed in this paper. NLP4SM is a web application permitted under Creative Commons License Attribu- tion 4.0 International (CC BY 4.0). for analysing the performance of Twitter language CEUR Workshop CEUR Workshop Proceedings (CEUR- http://ceur-ws.org ISSN 1613-0073 2 Proceedings WS.org) Prototype: https://nlp4sm.on.fleek.co/ 1 3 https://nlp.stanford.edu/projects/glove/ Production [8]: https://tweetnlp.org/demo/ 66 models fine-tuned to the tasks of (1) sentiment anal- Irony detection The goal is to classify whether a ysis, (2) emotion analysis, (3) offensive language tweet is ironic. The corpus of the Irony Detection classification, (4) hate speech classification, (5) irony task from SemEval18 was used to fit the model detection and (6) stance classification on abortion, [12].7 climate change, atheism, feminism and Hillary Clin- ton. NLP4SM allows on one hand the classification Offensive language It identifies whether a span of a free span of text, and on the other hand the of text has an offensive meaning. The corpus of classification of the meaning of a bunch of tweets OffensEval from SemEval19 was used to fit the returned by Twitter. Furthermore, the classification model [13].8 results are shown as charts to ease their understand- ing. NLP4SM can be used by non-NLP experts and Emoji prediction It aims at predicting the emoji NLP scientists that need to compare different lan- that best represent the meaning of a tweet. The guage models in one of the mentioned tasks on real corpus of Emoji Prediction from SemEval18 was data. The design of the system allows the consider- used to fit the model [14].9 ation of new language models of the previous NLP tasks, as well as the incorporation of new result Stance classification It classifies the author stance visualisation methods. according to a topic. The corpus of the task Detectin Stance from SemEval16 was used to fit the model. 2. Language Models in NLP4SM The topics considered are: abortion,10 atheism,11 feminism,12 climate change13 and Hillary Clinton.14 The first version of NLP4SM incorporates learning models that classify the meaning of tweets. The Multinguality Social networks are multilingual, learning models are based on the fine-tuning of and for this reason NLP4SM also allows to analyse Twitter language models to the specific NLP tasks, multilingual language models, namely those ones which we subsequently describe. based on XLM-R [15] that is fitted on a large set of tweets written in more than 50 languages. NLP4SM 2.1. NLP tasks also provides the XLM-T language model fitted to the sentiment analysis task in eight different lan- We select the NLP tasks according to their scientific guages [16]. relevancy, as well as the high social demand to have automatic systems that can identify specific kind of messages. The tasks are also part of TweetEval, 2.2. Language Models and we present them as what follows. The language models currently included in NLP4SM match with the ones in TweetEval and they are avail- Emotion analysis It identifies the underling emo- able in HuggingFace. We have used the RoBERTa- tion of a text. Although it is a multi-label task, we base model [17] pre-trained on English text from redefined it as a multi-class classification task. The social networks [7]. corpus “Affect in Tweets” [9] was used to fit the The fine-tuning of RoBERTa-base to each NLP model to the most frequent emotions of the corpus: task is based on a output layer with the same output joy, optimism, anger and sadness.4 units than the number of classes of each task [17]. Sentiment analysis It classifies the opinion polar- 7 https://huggingface.co/cardiffnlp/ ity in positive, negative or neutral. The corpus of twitter-roberta-base-irony the subtask A of “Sentiment Analysis in Twitter” 8 https://huggingface.co/cardiffnlp/ task of SemEval17 [10] was used to fit the model.5 twitter-roberta-base-offensive 9 https://huggingface.co/cardiffnlp/ twitter-roberta-base-emoji Hate speech It aims at classifying whether a tweet 10 https://huggingface.co/cardiffnlp/ express hate. The corpus of HateEval from Se- twitter-roberta-base-stance-abortion 11 mEval19 was used to fit the model [11].6 https://huggingface.co/cardiffnlp/ twitter-roberta-base-stance-atheism 4 12 https://huggingface.co/cardiffnlp/ https://huggingface.co/cardiffnlp/ twitter-roberta-base-emotion twitter-roberta-base-stance-feminist 5 13 https://huggingface.co/cardiffnlp/ https://huggingface.co/cardiffnlp/ twitter-roberta-base-sentiment twitter-roberta-base-stance-climate 6 14 https://huggingface.co/cardiffnlp/ https://huggingface.co/cardiffnlp/ twitter-roberta-base-hate twitter-roberta-base-stance-hillary 67 Figure 1: Sentiment analysis, ‘text mode’ mode. Figure 2: Sentiment analysis, ‘Twitter mode’. The languages models used are described and linked 4. Conclusions and future work in section 2.1. In this paper, we presented the prototype demon- stration NLP4SM, which aims at easing the access, 3. Description of NLP4SM analysis and comparison of classification models based on language models of different NLP tasks We aim at providing an unified and accessible plat- with real data from social networks. NLP4SM al- form for assessing and analysing social network text lows the evaluation of any span of text, and the classification models. Hence, we have developed a evaluation of tweets from a user query. web application for the first version of NLP4SM. We plan as future work: (1) to integrate more NLP4SM is built upon a client-server architecture NLP tasks, (2) to extend the number of language led by a REST API. Moreover, we have relied on models considered, and (3) to add a greater number external services for running the language models. of visualisation methods of results. NLP4SM uses Huggingface because it is currently the on-cloud service that hosts the language models included in NLP4SM, it is the artificial intelligence Acknowledgments service platform most used by the NLP research community and it provides a high quality service. This research work is supported by the R&D&I The server side is developed in Python and it is grant PID2020-116118GA-I00 funded by MCIN/ based on the micro-framework Flask. The server AEI/10.13039/501100011033. side is responsible of the communication with Hug- gingFace through using its API. Moreover, the server side queries Twitter according to the user References query. [1] T. Mikolov, K. Chen, G. Corrado, J. Dean, The client side is a web interface based on Efficient estimation of word representations in JavaScript React. It allows two different forms vector space, in: Proc. of Workshop at ICLR, of evaluating the models, namely: 2013. [2] M. T. Pilehvar, J. Camacho-Collados, Embed- Text mode It evaluates any language model de- dings in natural language processing: theory scribed in section 2 with a span of text written and advances in vector representations of mean- down by the user in a text box. Several charts show ing, Synthesis Lectures on Human Language the result of the evaluation. Figure 1 depicts and Technologies 13 (2020) 1–175. example of the text mode. [3] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional trans- Twitter mode It process a set of tweets returned formers for language understanding, in: in real-time from Twitter according to the user Proc. of the 2019 Conf. of the NAACL, query. The user can configure his query according Vol. 1 (Long and Short Papers), 2019, pp. to the language, the time and the specific text of 4171–4186. URL: https://aclanthology.org/ the query. NLP4SM retrieves the tweets and shows N19-1423. doi:10.18653/v1/N19-1423. with different kind of charts the result of running [4] T. Wolf, L. Debut, V. Sanh, J. Chaumond, the selected language model. Figure 2 depicts and C. Delangue, A. Moi, P. Cistac, T. Rault, example of the text mode. R. Louf, M. Funtowicz, J. Davison, S. Shleifer, 68 P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu, guinetti, SemEval-2019 task 5: Multilingual T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, detection of hate speech against immigrants A. Rush, Transformers: State-of-the-art nat- and women in Twitter, in: Proceedings of ural language processing, in: Proceedings of the 13th International Workshop on Seman- the 2020 Conference on Empirical Methods in tic Evaluation, Association for Computational Natural Language Processing: System Demon- Linguistics, Minneapolis, Minnesota, USA, strations, Association for Computational Lin- 2019, pp. 54–63. URL: https://aclanthology. guistics, Online, 2020, pp. 38–45. URL: https:// org/S19-2007. doi:10.18653/v1/S19-2007. aclanthology.org/2020.emnlp-demos.6. doi:10. [12] C. Van Hee, E. Lefever, V. Hoste, SemEval- 18653/v1/2020.emnlp-demos.6. 2018 Task 3: Irony detection in English [5] E. Martínez-Cámara, M. T. Martín-Valdivia, tweets, in: Proc. of The 12th Int. Workshop L. A. Ureña-López, A. Montejo-Ráez, Sen- on Semantic Evaluation, 2018. URL: https:// timent analysis in twitter, Natural Lan- aclanthology.org/S18-1005. doi:10.18653/v1/ guage Engineering 20 (2014) 1–28. doi:10. S18-1005. 1017/S1351324912000332. [13] M. Zampieri, S. Malmasi, P. Nakov, S. Rosen- [6] A. Wang, Y. Pruksachatkun, N. Nangia, thal, N. Farra, R. Kumar, SemEval-2019 task 6: A. Singh, J. Michael, F. Hill, O. Levy, Identifying and categorizing offensive language S. Bowman, Superglue: A stickier benchmark in social media (OffensEval), in: Proceedings for general-purpose language understanding of the 13th International Workshop on Seman- systems, in: H. Wallach, H. Larochelle, tic Evaluation, Association for Computational A. Beygelzimer, F. d'Alché-Buc, E. Fox, Linguistics, Minneapolis, Minnesota, USA, R. Garnett (Eds.), Advances in Neural 2019, pp. 75–86. URL: https://aclanthology. Information Processing Systems, volume 32, org/S19-2010. doi:10.18653/v1/S19-2010. Curran Associates, Inc., 2019. URL: https: [14] F. Barbieri, J. Camacho-Collados, F. Ronzano, //proceedings.neurips.cc/paper/2019/file/ L. Espinosa-Anke, M. Ballesteros, V. Basile, 4496bf24afe7fab6f046bf4923da8de6-Paper. V. Patti, H. Saggion, SemEval 2018 task 2: pdf. Multilingual emoji prediction, in: Proceed- [7] F. Barbieri, J. Camacho-Collados, L. Es- ings of The 12th International Workshop on pinosa Anke, L. Neves, TweetEval: Unified Semantic Evaluation, Association for Compu- benchmark and comparative evaluation for tational Linguistics, New Orleans, Louisiana, tweet classification, in: Findings of the 2018, pp. 24–33. URL: https://aclanthology. ACL: EMNLP 2020, 2020. URL: https: org/S18-1003. doi:10.18653/v1/S18-1003. //aclanthology.org/2020.findings-emnlp.148. [15] A. Conneau, K. Khandelwal, N. Goyal, doi:10.18653/v1/2020.findings-emnlp.148. V. Chaudhary, G. Wenzek, F. Guzmán, [8] J. Camacho-Collados, K. Rezaee, T. Riahi, E. Grave, M. Ott, L. Zettlemoyer, V. Stoy- A. Ushio, D. Loureiro, D. Antypas, J. Bois- anov, Unsupervised cross-lingual represen- son, L. Espinosa-Anke, F. Liu, E. Martínez- tation learning at scale, in: Proceed- Cámara, et al., Tweetnlp: Cutting-edge natu- ings of the 58th Annual Meeting of the ral language processing for social media, arXiv Association for Computational Linguistics, preprint arXiv:2206.14774 (2022). Association for Computational Linguistics, [9] S. Mohammad, F. Bravo-Marquez, Online, 2020, pp. 8440–8451. URL: https: M. Salameh, S. Kiritchenko, SemEval- //aclanthology.org/2020.acl-main.747. doi:10. 2018 task 1: Affect in tweets, in: 18653/v1/2020.acl-main.747. Proc. of The 12th Int. Workshop on [16] F. Barbieri, L. Espinosa-Anke, J. Camacho- Semantic Evaluation, 2018, pp. 1–17. Collados, A Multilingual Language Model URL: https://aclanthology.org/S18-1001. Toolkit for Twitter, in: arXiv preprint doi:10.18653/v1/S18-1001. arXiv:2104.12250, 2021. [10] S. Rosenthal, N. Farra, P. Nakov, SemEval- [17] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, 2017 task 4: Sentiment analysis in Twitter, in: D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, Proc. of the 11th Int. Workshop on Seman- V. Stoyanov, RoBERTa: A robustly optimized tic Evaluation (SemEval-2017), 2017, pp. 502– BERT pretraining approach, arXiv preprint 518. URL: https://aclanthology.org/S17-2088. arXiv:1907.11692 (2019). doi:10.18653/v1/S17-2088. [11] V. Basile, C. Bosco, E. Fersini, D. Nozza, V. Patti, F. M. Rangel Pardo, P. Rosso, M. San- 69