Deciphering Vaccine Sentiments: Transformer Models in Social Media Analysis S.Sushmitha1,∗,† , M.S.Shriram2,∗,† and S. Karthika3,∗,† 1 Department of Information Technology, Sri Sivasubramaniyanadar College of Engineering, Kalavakkam, Chennai 2 Department of Information Technology, Sri Sivasubramaniyanadar College of Engineering, Kalavakkam, Chennai 3 Department of Information Technology, Sri Sivasubramaniyanadar College of Engineering, Kalavakkam, Chennai Abstract The Forum for Information Retrieval Evaluation, a platform emphasizing on data evaluation and analysis, has conducted a novel challenge in the NLP domain titled Artificial Intelligence on Social Media. The main idea behind this task is to formulate an effective multi-label classifier that labels social media posts according to specific concerns towards vaccines expressed by authors of the posts. The critical point here is that each post has multiple unlikely characters and multiple labels, this would ideally require models with better attention mechanisms for accurate classifications. Upon appropriate pre-processing, our team performed three different transformer based approaches that provided optimal results compared to other models with lesser attention span. The models used were BERT, RoBERTa and XLNet. The BERT and RoBERTa models gave the same macro-f1 score of 0.57 but had a mild difference with the Jaccard score with RoBERTa giving 0.57 and BERT giving 0.56. The XLNet model gave out a 0.48 macro-f1 score and 0.49 Jaccard score. Our research primarily aims to throw light on why these state-of-the-art transformer approaches prove to be deal breakers in modern day NLP. Keywords NLP, Transfomers, BERT, RoBERTa, XLNET 1. Introduction In more recent times there is no greater illustration of how crucial vaccinations are as a vital weapon in fighting off diseases than how this has played out and the impact vaccines have had on controlling the COVID-19 pandemic. While there are numerous advantages to vaccination, there is still an extensive group of individuals who don’t trust in them. Such skepticism has been fed by a number of issues including doubts over the safety and effectiveness of vaccines, conspiracy theories about pharmaceutical companies and governments having hidden agendas. These issues need to be understood and addressed in order to encourage mass vaccination and secure the public health. In this paper, we propose an exploratory method to analyze social media sentiment regarding vaccines in general. They aim at building an efficient multi-label classification system for the task of identifying and labeling different types of issues raised in tweets about vaccines. Our typology includes many issues such as concerns over vaccine Forum for Information Retrieval Evaluation, December 15-18, 2023, India Envelope-Open sushmitha2010422@ssn.edu.in ( S.Sushmitha); shriram2010160@ssn.edu.in ( M.S.Shriram); skarthika@ssn.edu.in (S. Karthika) Orcid 0009-0002-2396-266X ( M.S.Shriram); 0000-0001-8919-5841 (S. Karthika) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings safety and efficacy, concerns about government influence or manipulation, various forms of conspiracies, and more. To accommodate the possibility that one tweet could address more than one issue, we have built our classification framework in such a way that multiple labels can be assigned to each tweet. We utilize cutting edge NLP models namely BERT, RoBERTa, XLNET for the task, which can learn subtle linguistic features and context information with great effectiveness. These tools allow us to get a pretty accurate reading of what people are saying in tweet-form; it’s quite useful for understanding feelings and topics people are discussing on social media.The goal of these studies is to advance knowledge about public attitudes and receptivity towards immunizations to better develop public health information campaign and communication strategies. With the power of social media data analytics and sophisticated advanced NLP models at our disposal, we are attempting to illuminate what may contribute to anti-vaccine attitudes so as to increase awareness and immunization rates across the globe 2. Related Work This section provides an overview of research in developing architectures for comment classifi- cation in the context of the COVID-19 pandemic’s impact on social media information flow. The urgency of reliable classification methods for the surge in COVID-19-related tweets has led to various approaches. One study [1] achieved remarkable F1 scores of 0.93 and 0.92 with two machine learning approaches for classifying tweets into three categories. Another work [2] focused on sentiment analysis of COVID-19-related Twitter posts from March to mid-April 2020, utilizing seven deep learning models with F1 scores consistently exceeding 0.90. In [3], advanced machine learning techniques resulted in 95.35% accuracy for sentiment analysis and 91.49% for topic classification. COVID-19 vaccine sentiment on Twitter was analyzed in [4], employing diverse data collection and preprocessing methods, along with classification using BERT Transformers. [5] introduced four deep learning models combining BERT with BiLSTM and BiGRU algorithms, excelling over classical machine learning models. [6] harnessed Bidirec- tional Long Short-Term Memory (Bi-LSTM) to analyze COVID-19 sentiments from Twitter and Reddit, surpassing conventional LSTM. [7] introduced a deep learning sentiment analysis model that achieved an accuracy of 78.062% on COVID-19-related tweets by focusing on sentence embeddings. [8] analyzed COVID-19 sentiment on Twitter, with BERT emerging as the most accurate model. [9] conducted sentiment analysis on Twitter discussions about COVID-19 vaccines, while [10] employed various sentiment analysis models, with BERT outperforming all. [11] conducted a comprehensive analysis of COVID-19 vaccine-related tweets, highlighting a shift from initial negativity to a more positive outlook. [12] investigated public perceptions of COVID-19 vaccine adverse effects through social media data, with LSTM achieving the highest accuracy. [13] analyzed #sideeffects social media conversations during early COVID-19 vaccine rollout, with BERT outperforming other methods. [14] examined over 200,000 COVID-19 vacci- nation tweets, achieving up to 90% accuracy in classifying misleading content. [15] focused on classifying COVID-19 vaccine-related tweets into negative, neutral, and positive sentiments, with Bidirectional LSTM (BiLSTM) standing out with a 94.12% accuracy rate.A variety of ma- chine learning and deep learning models have been successful in capturing the subtleties of public opinion during the pandemic, as shown by the most current work in this field, which shows the extensive efforts made to assess sentiments in tweets related to COVID-19. 3. Dataset The training dataset[16] provided by AiSoMe FIRE 2023[17] contains 9921 records about anti- vaccine tweets about COVID vaccines that were posted during 2020-21. The dataset contains tweet id tweet and labels.The important pattern to look into is that each tweet can be categorized under multiple labels. This dataset encompasses 12 unique categories, each explicitly defined by distinct concerns and sentiments regarding vaccines. These categories include ”Unnecessary,” where tweets suggest vaccines are unnecessary or that alternate cures are superior; ”Mandatory,” which pertains to objections against mandatory vaccination; ”Pharma,” focusing on skepticism towards pharmaceutical companies and their profit motives; ”Conspiracy,” delving into deeper vaccine-related conspiracies; ”Political,” addressing concerns about government or political influences on vaccination; ”Country,” involving objections based on the country of origin of vaccines; ”Rushed,” which expresses concerns about rushed or inadequately tested vaccine pro- cesses; ”Ingredients,” centering around worries regarding vaccine components and technology; ”Side-effect,” focusing on concerns about vaccine side effects and associated deaths; ”Ineffective,” which questions vaccine effectiveness; ”Religious,” indicating objections to vaccines for religious reasons; and finally, ”None,” representing tweets with no specified reason or concerns outside of the defined categories. A test dataset[16] was also provided which contained 486 records which were used for prediction by the models. 4. Methodology Upon receiving the training dataset for the AISoMe track as part of the FIRE 2023 conference we first analyzed a few inconsistencies that the textual data had. To overcome such inconsistencies and to use a clean form of data for the multi-label classifier model, we first used certain methods to preprocess the data. In the provided dataset, each “post” started off with the usernames being tagged. In social media tagging a user usually involves using the “@” symbol. So the first task involved removing the “@” symbol. Post this, we removed html tags, brackets and emojis and converted the characters to lower cases. This was followed by stop word removal and lemmatization. This preprocessing technique provided us with clean data that can be used for further processing. Since there are multiple labels for a single post, this task can be consid- ered multi label classification. The processing of these labels involved splitting them up and encoding the labels, using multi-label binarizer technique to the appropriate numerical format. Upon understanding the attention mechanism of transformer based models, we proceeded to incorporate three such models namely BERT, RoBERTa and XLNET for the classification task. These models were chosen because of the self-attention mechanism that they possess which enables it to weigh the significance of different elements within a sequence simultaneously, this gives an edge over traditional models that relied on sequential data processing. 4.1. BERT The first model devised for this task is the Bidirectional Encoder Representation from Transform- ers, shortly known as BERT. It is a state of the art model used in a variety of NLP tasks but in this particular case, we are formulating one for multi-label classification. Any text classification model requires in depth context extraction/learning. BERT has this intricate feature called MLM that helps it learn this concept more efficiently. MLM refers to a masked language model which means, certain tokens in a text are intentionally masked or hidden, and the model’s task is to predict or fill in the missing tokens. BERT is bidirectional which means it takes into account both left and right contexts of the masked tokens in general. While using BERT for The input text is tokenized and fed into the BERT model, and the final hidden state corresponding to the [CLS] token (a special token added to the input) is used as the representation for the entire input sequence.classification, we append a classification layer upon the pre-trained BERT model.The output of the BERT model’s [CLS] token representation is passed through this classification layer. The number of output units in the classification layer corresponds to the number of classes in the classification task, for this task it is 12 output units representing 12 unique classes. 4.2. RoBERTa RoBERTa, an extension of the BERT model, excels in various NLP tasks, including multi-label classification. It utilizes a Masked Language Model (MLM) during pre-training, similar to BERT, where certain tokens in the input text are concealed, and the model’s objective is to predict these hidden tokens, taking both left and right contexts into account. The key difference between BERT and RoBERTa here is in BERT architecture, the masking is performed once during data preprocessing, resulting in a single static mask. To avoid using the single static mask, training data is duplicated and masked 10 times, each time with a different mask strategy over 40 epochs thus having 4 epochs with the same mask. This strategy is compared with dynamic masking in which different masking is generated every time we pass data into the model.. In multi-label classification, RoBERTa adapts by adding a classification layer atop its pre-trained layers. Input text is tokenized, and the final hidden state associated with the [CLS] token is used as the sequence representation. The number of output units in the classification layer matches the number of classes in the classification task. During inference a sigmoid activation was used and a threshold is applied to determine the assigned labels. 4.3. XLNET Much similar to BERT and RoBERTa, XLNET also has a bidirectional approach for concept learning but what sets XLNET apart is the autoregressive method that it employs. This helps XLNET to capture contextual dependencies .Much like RoBERTa, XLNet employs a Masked Language Model (MLM) during pre-training, akin to BERT, where specific tokens within the input text are concealed, and the model’s primary objective is to predict these concealed tokens, factoring in both left and right contexts. The pivotal distinction between BERT and XLNet lies in the methodology of masking. While BERT employs a static mask, applied once during data preprocessing, XLNet opts for a more dynamic approach. During training, the data is duplicated and masked tenfold, with each instance employing a distinct masking strategy. This dynamic masking strategy evolves with each pass through the model, enriching context comprehension. In the realm of multi-label classification, XLNet adapts seamlessly by introducing a classification layer atop its pre-trained layers. Text inputs are tokenized, and the ultimate hidden state corresponding to the [CLS] token functions as the sequence representation. The number of output units in our case is 12 mirroring the number of classes in the classification task. During inference, sigmoid activation function was used, with a threshold applied to establish the assigned labels similar to the other 2 models. 5. Evaluation and Results The training phase of our models was a meticulous process that involved fine-tuning several critical hyperparameters, each with its unique role in shaping model performance. Weight decay, represented by λ and set at 0.001, acted as a vital regularizer. It controlled the influence of model weights on the loss function, effectively curbing overfitting by penalizing large weight values. The learning rate, denoted as η and thoughtfully configured at 0.0001 or 0.00001, played a pivotal role in determining the step size for weight updates during training. This parameter significantly impacted convergence, with an appropriate learning rate ensuring that the model effectively learned from the data without overshooting optimal parameter values. Dropout rates, which ranged between 0.4 and 0.5, were judiciously chosen to regulate neuron activations. Dropout is a regularization technique that prevents over-reliance on specific neurons by randomly deactivating a fraction during each training iteration, promoting generalization. Furthermore, the ”epochs” parameter governed the number of complete iterations through the training data, allowing models to progressively learn and adapt over time. Finally, the ”maximum sequence length” parameter was set at 300, ensuring that input sequences were appropriately processed during training, thereby enhancing the model’s overall ability to achieve peak accuracy. These hyperparameters collectively contributed to the fine-tuning process, facilitating the models’ optimization and ensuring their robust performance.Table 1 specifies the various hyperparameters and their corresponding values that gave maximum training accuracy Table 1 Comparison of models based on differnt hyperparameters and their corresponding values Models weight learning dropout epochs max length accuracy decay rate BERT 0.001 0.0001 0.5 10 300 99.2 RoBERTa 0.001 0.00001 0.4 10 300 98.12 XLNET 0.001 0.0001 0.5 10 300 96.02 In the evaluation phase, a dedicated test dataset was provided to rigorously assess the performance of the trained models. Each model underwent a single run, generating predictions that were then submitted for evaluation. The evaluation process hinged on two pivotal metrics: the Macro F1 Score and Jaccard Score.Together, these two reliable assessment criteria provided reliable benchmarks for carefully assessing and contrasting the performance of the models The Macro F1-Score is a metric used to evaluate the performance of a classification model. It is calculated by taking the harmonic mean of precision and recall. The Macro F1-Score is useful in multiclass classification problems where the classes are imbalanced. It gives equal weight to each class, making it a good metric to use when the data is imbalanced. 𝐹 1𝑆𝑐𝑜𝑟𝑒 = 2 ∗ (𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙)/(𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙) (1) ∑ F1 scores Macro F1 Score = (2) Number of classes The Jaccard Index, also known as the Jaccard similarity coefficient, is a statistic used in understanding the similarities between sample sets. The Jaccard Score is particularly valuable for evaluating models dealing with binary or multiclass classification, as it doesn’t take into account true negatives, focusing solely on the overlap between predicted and actual positive instances. |𝑈 ∩ 𝑉 | Jaccard(𝑈 , 𝑉 ) = (3) |𝑈 ∪ 𝑉 | Table 2 Run Results Models Macro F1 Jaccard RoBERTa 0.57 0.57 BERT 0.57 0.56 XLNET 0.48 0.49 Table 2 specifies the final Macro-F1 and Jaccard scores of the runs for different models submitted. 6. Conclusion This study harnessed advanced transformer-based models, BERT, RoBERTa, and XLNET, to ad- dress the complex task of multi-label classification in the context of tweets expressing concerns about vaccines, primarily comprising anti-vaccine sentiments. Notably, RoBERTa demonstrated superior performance, followed closely by BERT and XLNET. In this predominantly anti-vaccine dataset, these models showcased their prowess in discerning and categorizing various concerns. To further enhance our model’s performance, future investigations could explore data augmen- tation techniques, given the data-hungry nature of transformer-based models. Additionally, fortifying model robustness through methods like adversarial training presents a promising avenue. These efforts align with our mission of accurately categorizing concerns within the domain of vaccine-related discussions. References [1] Anupam Mondal , Sainik Kumar Mahata , Monalisa Dey , Dipankar Das, ”Classification of COVID19 tweets using Machine Learning Approaches”,SMM4H 1 June 2021. [2] Vernikou, S., Lyras, A. & Kanavos, A. Multiclass sentiment analysis on COVID-19-related tweets using deep learning models. Neural Comput & Applic 34, 19615–19627 (2022). https://doi.org/10.1007/s00521-022-07650-2 [3] Gurkan, Caglar , Kozalioglu, Sude , Akdag, Ismail , Göçen, Cem , Palandoken, Merih. (2022). COVID-19 Related Tweets Classification and Sentiment Analysis Based on Machine Learning Approaches and Deep Learning Architecture Designs: A Comprehensive Analysis. 10.13140/RG.2.2.18200.47365/1. [4] K. K. Agustiningsih, E. Utami and H. Al Fatta, ”Sentiment Analysis of COVID-19 Vac- cine on Twitter Social Media: Systematic Literature Review,” 2021 IEEE 5th Inter- national Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), Purwokerto, Indonesia, 2021, pp. 121-126, doi: 10.1109/ICI- TISEE53823.2021.9655960. [5] Talaat, A.S. Sentiment analysis classification system using hybrid BERT models. J Big Data 10, 110 (2023). https://doi.org/10.1186/s40537-023-00781-w [6] Arbane M, Benlamri R, Brik Y, Alahmar AD. Social media-based COVID-19 senti- ment classification model using Bi-LSTM. Expert Syst Appl. 2023 Feb;212:118710. doi: 10.1016/j.eswa.2022.118710. Epub 2022 Aug 30. PMID: 36060151; PMCID: PMC9425711. [7] Fattoh IE, Kamal Alsheref F, Ead WM, Youssef AM. Semantic Sentiment Classification for COVID-19 Tweets Using Universal Sentence Encoder. Comput Intell Neurosci. 2022 Oct 5;2022:6354543. doi: 10.1155/2022/6354543. PMID: 36248924; PMCID: PMC9556213. [8] U. N. Wisesty, R. Rismala, W. Munggana and A. Purwarianti, ”Comparative Study of Covid-19 Tweets Sentiment Classification Methods,” 2021 9th International Conference on Information and Communication Technology (ICoICT), Yogyakarta, Indonesia, 2021, pp. 588-593, doi: 10.1109/ICoICT52021.2021.9527533. [9] J. Philip, V. N. Thatha, M. Harshini, I. V. S. L. Haritha, S. Patil and B. Veerasekhar Reddy, ”Classification of Covid-19 Vaccines tweets using Naïve Bayes Classification,” 2022 6th International Conference on Electronics, Communication and Aerospace Technology, Coimbatore, India, 2022, pp. 1384-1387, doi: 10.1109/ICECA55336.2022.10009511. [10] V. Battula, S. G. Goli and J. Nasigari, ”Identification of Optimal Model for Multi-Class Classification of COVID Tweets,” 2022 9th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 2022, pp. 495-499, doi: 10.23919/INDIACom54597.2022.9763291. [11] S. Gottipati and D. Guha, ”Analysing Tweets on COVID-19 Vaccine: A Text Mining Ap- proach,” 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 2022, pp. 0467-0474, doi: 10.1109/CCWC54503.2022.9720793. [12] K. R. S. N. Kariyapperuma, K. Banujan, P. M. A. K. Wijeratna and B. T. G. S. Kumara, ”Classification of Covid19 Vaccine-Related Tweets Using Deep Learning,” 2022 International Conference on Data Analytics for Business and Industry (ICDABI), Sakhir, Bahrain, 2022, pp. 1-5, doi: 10.1109/ICDABI56818.2022.10041615. [13] A. R. Manjrekar, D. J. McConnell and S. S. Gokhale, ”Analyzing Twitter Conversations on Side Effects of Covid-19 Vaccine,” 2022 2nd International Conference on Intelligent Technologies (CONIT), Hubli, India, 2022, pp. 1-8, doi: 10.1109/CONIT55038.2022.9848134. [14] S. Sharma, R. Sharma and A. Datta, ”(Mis)leading the COVID-19 Vaccination Discourse on Twitter: An Exploratory Study of Infodemic Around the Pandemic,” in IEEE Transactions on Computational Social Systems, doi: 10.1109/TCSS.2022.3225216. [15] N. Mansouri, M. Soui, I. Alhassan and M. Abed, ”TextBlob and BiLSTM for Sentiment analysis toward COVID-19 vaccines,” 2022 7th International Conference on Data Science and Machine Learning Applications (CDMA), Riyadh, Saudi Arabia, 2022, pp. 73-78, doi: 10.1109/CDMA54072.2022.00017. [16] Soham Poddar, Azlaan Mustafa Samad, Rajdeep Mukherjee, Niloy Ganguly, and Saptarshi Ghosh. 2022. CAVES: A Dataset to facilitate Explainable Classification and Summariza- tion of Concerns towards COVID Vaccines. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22). Association for Computing Machinery, New York, NY, USA, 3154–3164. https://- doi.org/10.1145/3477495.3531745 [17] Soham Poddar , Moumita Basu, Kripabandhu, and Saptarshi Ghosh. ”Overview of the FIRE 2023 Track: Artificial Intelligence on Social Media (AISoMe).” In Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation, 2023.