NLP4SM: Natural Language Processing for social media
Gonzalo Medina Medina1, Jose Camacho Collados2 and Eugenio Martínez Cámara1
1
 Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and
Computational Intelligence (DaSCI), University of Granada, Spain
2
    School of Computer Science and Informatics, Cardiff University, United Kingdom


                                           Abstract
                                           NLP4SM is a website for the execution, analysis and comparison of tweet classification methods based on
                                           language models. Currently, NLP4SM supports the text classification tasks considered in TweetEval, but it
                                           aims at integrating additional text clasification tasks and to wider the number of language models available
                                           with the goal of becoming to a benchmark platform for assessing text classification methods with real data
                                           from social media.

                                           Keywords
                                           Language models, text classification, social media.


1. Introduction                                                                               of a message.
                                                                                                 The potential of language models has made them
The most likely source of the vertiginous progress of                                         the baseline of a wide range of NLP tasks, and they
Natural Language Processing (NLP) in the recent                                               can even be used for developing learning models in
years is the proposal of the Word2Vec model [1],                                              production environments. On the other hand, the
which eases the generation of unsupervised linguis-                                           ease of tuning these models to specific NLP tasks has
tic features that are known as word embeddings                                                led the development and release of a huge amount
and they represent the meaning of words in vec-                                               of pre-trained language models in a large bunch
tors of real numbers. The strong results reached                                              of NLP task, with HuggingFace and especially its
by word embeddings based on Word2Vect enhanced                                                Transformers library [4] standing out. This vast
the design of new word embeddings models, such                                                variety of language models makes their comparison
as Glove.1 These models set an embedding vec-                                                 and analysis really difficult as a previous step of the
tor to each word regardless of its context, and for                                           particular language model to fine-tune to a specific
this reason the next landmark were starred by the                                             use case.
contextual word embeddings models [2]. The trans-                                                The certain use of language in social networks
formers models stand out as contextual word em-                                               makes to adapt the NLP methods to the specific use
beddings, with BERT [3] as outstanding example.                                               of language of each social network, as for instance
These models are known as language models, and                                                to Twitter [5]. Language models also needs this
their capacity of representing the meaning of words                                           fitting to the use of language of social networks,
couple with the possibility of using them as pre-                                             which makes them to be at the top of most NLP
trained models have driven the progress of a broad                                            shared-tasks.
branch of NLP tasks, especially those mostly linked                                              The great availability of language models has
to the classification of the semantic meaning of text,                                        not been coupled with the release of web platforms
such as the opinion polarity of a review, the offen-                                          for comparing and analysing the different language
sive meaning or the underlying emotional meaning                                              models in specific NLP tasks. Nevertheless, the
                                                                                              issue of the great availability of training corpora
SEPLN-PD 2022. Annual Conference of the Spanish                                               and the evaluation of learning models begins to
Association for Natural Language Processing 2022:                                             be resolved by the publication of leader boards of
Projects and Demonstrations, September 21-23, 2022, A                                         learning models trained on gold standards, such as
Coruña, Spain
                                                                                              SuperGLUE [6] or TweetEval [7].
$ gmedina95@correo.ugr.es (G. Medina Medina);
camachocolladosj@cardiff.ac.uk (J. Collados);                                                    Following the example of the NLP classification
emcamara@decsai.ugr.es (E. Martínez Cámara)                                                   tasks leader boards, we present the web platform
 0000-0003-1618-7239 (J. Collados); 0000-0002-5279-8355                                      NLP4SM,2,3 whose demonstrative prototype is de-
(E. Martínez Cámara)
                                       © 2022 Copyright for this paper by its authors. Use
                                                                                              scribed in this paper. NLP4SM is a web application
                                       permitted under Creative Commons License Attribu-
                                       tion 4.0 International (CC BY 4.0).
                                                                                              for analysing the performance of Twitter language
    CEUR
    Workshop
                         CEUR Workshop Proceedings (CEUR-
                  http://ceur-ws.org
                  ISSN 1613-0073
                                                                                                  2
    Proceedings

                        WS.org)                                                                       Prototype: https://nlp4sm.on.fleek.co/
              1                                                                                   3
                     https://nlp.stanford.edu/projects/glove/                                         Production [8]: https://tweetnlp.org/demo/


                                                                                             66
models fine-tuned to the tasks of (1) sentiment anal-       Irony detection The goal is to classify whether a
ysis, (2) emotion analysis, (3) offensive language          tweet is ironic. The corpus of the Irony Detection
classification, (4) hate speech classification, (5) irony   task from SemEval18 was used to fit the model
detection and (6) stance classification on abortion,        [12].7
climate change, atheism, feminism and Hillary Clin-
ton. NLP4SM allows on one hand the classification           Offensive language It identifies whether a span
of a free span of text, and on the other hand the           of text has an offensive meaning. The corpus of
classification of the meaning of a bunch of tweets          OffensEval from SemEval19 was used to fit the
returned by Twitter. Furthermore, the classification        model [13].8
results are shown as charts to ease their understand-
ing. NLP4SM can be used by non-NLP experts and              Emoji prediction It aims at predicting the emoji
NLP scientists that need to compare different lan-          that best represent the meaning of a tweet. The
guage models in one of the mentioned tasks on real          corpus of Emoji Prediction from SemEval18 was
data. The design of the system allows the consider-         used to fit the model [14].9
ation of new language models of the previous NLP
tasks, as well as the incorporation of new result
                                                            Stance classification It classifies the author stance
visualisation methods.
                                                            according to a topic. The corpus of the task Detectin
                                                            Stance from SemEval16 was used to fit the model.
2. Language Models in NLP4SM                                The topics considered are: abortion,10 atheism,11
                                                            feminism,12 climate change13 and Hillary Clinton.14
The first version of NLP4SM incorporates learning
models that classify the meaning of tweets. The             Multinguality Social networks are multilingual,
learning models are based on the fine-tuning of             and for this reason NLP4SM also allows to analyse
Twitter language models to the specific NLP tasks,          multilingual language models, namely those ones
which we subsequently describe.                             based on XLM-R [15] that is fitted on a large set of
                                                            tweets written in more than 50 languages. NLP4SM
2.1. NLP tasks                                              also provides the XLM-T language model fitted to
                                                            the sentiment analysis task in eight different lan-
We select the NLP tasks according to their scientific       guages [16].
relevancy, as well as the high social demand to have
automatic systems that can identify specific kind
of messages. The tasks are also part of TweetEval, 2.2. Language Models
and we present them as what follows.                   The language models currently included in NLP4SM
                                                       match with the ones in TweetEval and they are avail-
Emotion analysis It identifies the underling emo- able in HuggingFace. We have used the RoBERTa-
tion of a text. Although it is a multi-label task, we base model [17] pre-trained on English text from
redefined it as a multi-class classification task. The social networks [7].
corpus “Affect in Tweets” [9] was used to fit the        The fine-tuning of RoBERTa-base to each NLP
model to the most frequent emotions of the corpus: task is based on a output layer with the same output
joy, optimism, anger and sadness.4                     units than the number of classes of each task [17].

Sentiment analysis It classifies the opinion polar-     7
                                                          https://huggingface.co/cardiffnlp/
ity in positive, negative or neutral. The corpus of twitter-roberta-base-irony
the subtask A of “Sentiment Analysis in Twitter”        8
                                                          https://huggingface.co/cardiffnlp/
task of SemEval17 [10] was used to fit the model.5 twitter-roberta-base-offensive
                                                        9
                                                                 https://huggingface.co/cardiffnlp/
                                                            twitter-roberta-base-emoji
Hate speech It aims at classifying whether a tweet            10
                                                                 https://huggingface.co/cardiffnlp/
express hate. The corpus of HateEval from Se-               twitter-roberta-base-stance-abortion
                                                              11
mEval19 was used to fit the model [11].6                         https://huggingface.co/cardiffnlp/
                                                            twitter-roberta-base-stance-atheism
    4                                                         12
      https://huggingface.co/cardiffnlp/                         https://huggingface.co/cardiffnlp/
twitter-roberta-base-emotion                                twitter-roberta-base-stance-feminist
    5                                                         13
      https://huggingface.co/cardiffnlp/                         https://huggingface.co/cardiffnlp/
twitter-roberta-base-sentiment                              twitter-roberta-base-stance-climate
    6                                                         14
      https://huggingface.co/cardiffnlp/                         https://huggingface.co/cardiffnlp/
twitter-roberta-base-hate                                   twitter-roberta-base-stance-hillary


                                                        67
Figure 1: Sentiment analysis, ‘text mode’ mode.       Figure 2: Sentiment analysis, ‘Twitter mode’.


The languages models used are described and linked    4. Conclusions and future work
in section 2.1.
                                                      In this paper, we presented the prototype demon-
                                                      stration NLP4SM, which aims at easing the access,
3. Description of NLP4SM                              analysis and comparison of classification models
                                                      based on language models of different NLP tasks
We aim at providing an unified and accessible plat- with real data from social networks. NLP4SM al-
form for assessing and analysing social network text lows the evaluation of any span of text, and the
classification models. Hence, we have developed a evaluation of tweets from a user query.
web application for the first version of NLP4SM.         We plan as future work: (1) to integrate more
   NLP4SM is built upon a client-server architecture NLP tasks, (2) to extend the number of language
led by a REST API. Moreover, we have relied on models considered, and (3) to add a greater number
external services for running the language models. of visualisation methods of results.
NLP4SM uses Huggingface because it is currently
the on-cloud service that hosts the language models
included in NLP4SM, it is the artificial intelligence Acknowledgments
service platform most used by the NLP research
community and it provides a high quality service. This research work is supported by the R&D&I
   The server side is developed in Python and it is grant PID2020-116118GA-I00 funded by MCIN/
based on the micro-framework Flask. The server AEI/10.13039/501100011033.
side is responsible of the communication with Hug-
gingFace through using its API. Moreover, the
server side queries Twitter according to the user
                                                      References
query.                                                 [1] T. Mikolov, K. Chen, G. Corrado, J. Dean,
   The client side is a web interface based on             Efficient estimation of word representations in
JavaScript React. It allows two different forms            vector space, in: Proc. of Workshop at ICLR,
of evaluating the models, namely:                          2013.
                                                        [2] M. T. Pilehvar, J. Camacho-Collados, Embed-
Text mode It evaluates any language model de-               dings in natural language processing: theory
scribed in section 2 with a span of text written            and advances in vector representations of mean-
down by the user in a text box. Several charts show         ing, Synthesis Lectures on Human Language
the result of the evaluation. Figure 1 depicts and          Technologies 13 (2020) 1–175.
example of the text mode.                               [3] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova,
                                                            BERT: Pre-training of deep bidirectional trans-
Twitter mode It process a set of tweets returned            formers for language understanding,         in:
in real-time from Twitter according to the user             Proc. of the 2019 Conf. of the NAACL,
query. The user can configure his query according           Vol. 1 (Long and Short Papers), 2019, pp.
to the language, the time and the specific text of          4171–4186. URL: https://aclanthology.org/
the query. NLP4SM retrieves the tweets and shows            N19-1423. doi:10.18653/v1/N19-1423.
with different kind of charts the result of running     [4] T. Wolf, L. Debut, V. Sanh, J. Chaumond,
the selected language model. Figure 2 depicts and           C. Delangue, A. Moi, P. Cistac, T. Rault,
example of the text mode.                                   R. Louf, M. Funtowicz, J. Davison, S. Shleifer,


                                                   68
     P. von Platen, C. Ma, Y. Jernite, J. Plu, C. Xu,         guinetti, SemEval-2019 task 5: Multilingual
     T. Le Scao, S. Gugger, M. Drame, Q. Lhoest,              detection of hate speech against immigrants
     A. Rush, Transformers: State-of-the-art nat-             and women in Twitter, in: Proceedings of
     ural language processing, in: Proceedings of             the 13th International Workshop on Seman-
     the 2020 Conference on Empirical Methods in              tic Evaluation, Association for Computational
     Natural Language Processing: System Demon-               Linguistics, Minneapolis, Minnesota, USA,
     strations, Association for Computational Lin-            2019, pp. 54–63. URL: https://aclanthology.
     guistics, Online, 2020, pp. 38–45. URL: https://         org/S19-2007. doi:10.18653/v1/S19-2007.
     aclanthology.org/2020.emnlp-demos.6. doi:10.        [12] C. Van Hee, E. Lefever, V. Hoste, SemEval-
     18653/v1/2020.emnlp-demos.6.                             2018 Task 3: Irony detection in English
 [5] E. Martínez-Cámara, M. T. Martín-Valdivia,               tweets, in: Proc. of The 12th Int. Workshop
     L. A. Ureña-López, A. Montejo-Ráez, Sen-                 on Semantic Evaluation, 2018. URL: https://
     timent analysis in twitter, Natural Lan-                 aclanthology.org/S18-1005. doi:10.18653/v1/
     guage Engineering 20 (2014) 1–28. doi:10.                S18-1005.
     1017/S1351324912000332.                             [13] M. Zampieri, S. Malmasi, P. Nakov, S. Rosen-
 [6] A. Wang, Y. Pruksachatkun, N. Nangia,                    thal, N. Farra, R. Kumar, SemEval-2019 task 6:
     A. Singh, J. Michael, F. Hill, O. Levy,                  Identifying and categorizing offensive language
     S. Bowman, Superglue: A stickier benchmark               in social media (OffensEval), in: Proceedings
     for general-purpose language understanding               of the 13th International Workshop on Seman-
     systems, in: H. Wallach, H. Larochelle,                  tic Evaluation, Association for Computational
     A. Beygelzimer, F. d'Alché-Buc, E. Fox,                  Linguistics, Minneapolis, Minnesota, USA,
     R. Garnett (Eds.), Advances in Neural                    2019, pp. 75–86. URL: https://aclanthology.
     Information Processing Systems, volume 32,               org/S19-2010. doi:10.18653/v1/S19-2010.
     Curran Associates, Inc., 2019. URL: https:          [14] F. Barbieri, J. Camacho-Collados, F. Ronzano,
     //proceedings.neurips.cc/paper/2019/file/                L. Espinosa-Anke, M. Ballesteros, V. Basile,
     4496bf24afe7fab6f046bf4923da8de6-Paper.                  V. Patti, H. Saggion, SemEval 2018 task 2:
     pdf.                                                     Multilingual emoji prediction, in: Proceed-
 [7] F. Barbieri, J. Camacho-Collados, L. Es-                 ings of The 12th International Workshop on
     pinosa Anke, L. Neves, TweetEval: Unified                Semantic Evaluation, Association for Compu-
     benchmark and comparative evaluation for                 tational Linguistics, New Orleans, Louisiana,
     tweet classification, in: Findings of the                2018, pp. 24–33. URL: https://aclanthology.
     ACL: EMNLP 2020, 2020. URL: https:                       org/S18-1003. doi:10.18653/v1/S18-1003.
     //aclanthology.org/2020.findings-emnlp.148.         [15] A. Conneau, K. Khandelwal, N. Goyal,
     doi:10.18653/v1/2020.findings-emnlp.148.                 V. Chaudhary, G. Wenzek, F. Guzmán,
 [8] J. Camacho-Collados, K. Rezaee, T. Riahi,                E. Grave, M. Ott, L. Zettlemoyer, V. Stoy-
     A. Ushio, D. Loureiro, D. Antypas, J. Bois-              anov, Unsupervised cross-lingual represen-
     son, L. Espinosa-Anke, F. Liu, E. Martínez-              tation learning at scale,        in: Proceed-
     Cámara, et al., Tweetnlp: Cutting-edge natu-             ings of the 58th Annual Meeting of the
     ral language processing for social media, arXiv          Association for Computational Linguistics,
     preprint arXiv:2206.14774 (2022).                        Association for Computational Linguistics,
 [9] S.     Mohammad,         F.     Bravo-Marquez,           Online, 2020, pp. 8440–8451. URL: https:
     M. Salameh, S. Kiritchenko,           SemEval-           //aclanthology.org/2020.acl-main.747. doi:10.
     2018 task 1:        Affect in tweets,         in:        18653/v1/2020.acl-main.747.
     Proc. of The 12th Int. Workshop on                  [16] F. Barbieri, L. Espinosa-Anke, J. Camacho-
     Semantic Evaluation, 2018, pp. 1–17.                     Collados, A Multilingual Language Model
     URL:       https://aclanthology.org/S18-1001.            Toolkit for Twitter,      in: arXiv preprint
     doi:10.18653/v1/S18-1001.                                arXiv:2104.12250, 2021.
[10] S. Rosenthal, N. Farra, P. Nakov, SemEval-          [17] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi,
     2017 task 4: Sentiment analysis in Twitter, in:          D. Chen, O. Levy, M. Lewis, L. Zettlemoyer,
     Proc. of the 11th Int. Workshop on Seman-                V. Stoyanov, RoBERTa: A robustly optimized
     tic Evaluation (SemEval-2017), 2017, pp. 502–            BERT pretraining approach, arXiv preprint
     518. URL: https://aclanthology.org/S17-2088.             arXiv:1907.11692 (2019).
     doi:10.18653/v1/S17-2088.
[11] V. Basile, C. Bosco, E. Fersini, D. Nozza,
     V. Patti, F. M. Rangel Pardo, P. Rosso, M. San-


                                                    69