VaxiBERT: A BERT-Based Classifier for Vaccine Tweets with Multi-Label Annotations Shivangi Bithel, Samidha Verma, Prachi and Rajat Singh Indian Institute of Technology, Delhi Abstract Vaccination has long been seen as an essential component of public health, providing a critical line of defense against infectious diseases. Our primary objective is to build a robust multi-label classification system capable of categorizing individual social media posts, specifically tweets, based on the numer- ous vaccine-related concerns stated by their writers. These reservations cover many issues, including misgivings about necessity, safety, and political intentions. We discuss our approach and evaluation results, shedding light on the intricate interplay between feeling, society, and science in the arena of the vaccine debate, using cutting-edge models such as Covid-Twitter-BERT and OpenLLaMA-7B. Our best-submitted run achieved a 0.67 macro-F1 Score and 0.70 Jaccard score. Github Code: https://github.com/shivangibithel/VaxiBERT_AISoMe2023 Keywords Sentiment Analysis, COVID-19 Vaccine Tweets, COVID-Twitter-BERT, Large Language Model, LoRA PEFT, Multi-label Classification 1. Introduction A key component of public health for decades has been vaccination, a strong defense against the spreading of infectious diseases. Its significance in preventing outbreaks and safeguarding local populations cannot be emphasized. The crucial role that vaccination played in containing the COVID-19 pandemic, a global emergency that has brought vaccinations into the public eye like never before, has served as a reminder of the need for vaccination in the modern era. Beyond the pandemic reaction, widespread vaccination acceptance, especially on a societal level, continues to be essential in preventing disease resurgence, preventing childhood diseases, and reducing the yearly assault of seasonal illnesses like influenza. However, the vaccine environment is defined by the complexity that reaches far beyond the scientific arena. A distinct undercurrent of suspicion has emerged, spurred by various issues ranging from politics to alleged side effects. This skepticism is a severe obstacle that must be addressed as we strive for widespread protection through vaccination. Understanding the Forum for Information Retrieval Evaluation, December 15-18, 2023, India " csy207657@cse.iitd.ac.in (S. Bithel); csy207575@cse.iitd.ac.in (S. Verma); prachi@cse.iitd.ac.in ( Prachi); rajat.singh@cse.iitd.ac.in (R. Singh) ~ https://shivangibithel.github.io/ (S. Bithel); https://github.com/prach6i ( Prachi); https://github.com/rajatb115 (R. Singh)  0000-0002-6152-4866 (S. Bithel); 0009-0001-9513-3142 (S. Verma); 0009-0000-6663-5226 ( Prachi); 0000-0002-9375-2580 (R. Singh) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings complex issues surrounding vaccinations is critical, and in this age of digital connectedness, social media platforms have emerged as a great source of information. In this context, our work aims to navigate the complex web of public opinion on vaccines, mainly expressed in social media’s unfiltered and dynamic arena. Our primary goal is to create a solid and dynamic multi-label classification system capable of categorizing individual social media posts, specifically tweets, based on the various vaccine-related concerns raised by their authors. It is critical to recognize that these worries are not uniform; a single tweet may include numerous separate vaccine-related concerns. To simplify our classification task, we have a comprehensive set of concern labels that capture the wide range of anxieties permeating the vaccine discourse. These terms cover a variety of concerns, including skepticism about the necessity and safety of vaccinations, suspicions of larger conspiracies, political motivations behind vaccination mandates, and uncertainty about the effectiveness of vaccinations. Concerns also include vaccines’ origins, make-up, and alleged negative effects, with personal religious beliefs influencing opinions. In a time when information travels through digital channels at unprecedented speeds, our study aims to use the vast amounts of data generated on social media platforms to shed light on the complex world of vaccine apprehension. We aim to provide insightful contributions that can inform public health strategies, enhance vaccine communication, and foster a more nuanced understanding of the complex interplay between science, society, and sentiment in the field of vaccination by analyzing the concerns raised by individuals in their tweets. 2. Task The task " Building an effective multi-label classifier to label a social media post(particularly, a tweet) according to the specific concern(s) towards vaccines as expressed by the author of the post" organized as a part of AISoMe (Artificial Intelligence on Social Media) Track in the FIRE (Forum for Information Retrieval Evaluation) 2023, we present an effective approach in this paper [1, 2]. A tweet can have more than one label (concern); e.g., a tweet expressing three different concerns towards vaccines will have three labels. The tweets are classified into multiple classes described below with examples: • Unnecessary - "The tweet indicates vaccines are unnecessary or that alternate cures are better." • Mandatory - "Against mandatory vaccination — The tweet suggests that vaccines should not be made mandatory." • Pharma - "Against Big Pharma — The tweet indicates that the Big Pharmaceutical companies are just trying to earn money, or the tweet is against such companies in general because of their history." • Conspiracy - "Deeper Conspiracy — The tweet suggests some deeper conspiracy, and not just that the Big Pharma wants to make money (e.g., vaccines are being used to track people, COVID is a hoax)" • Political - "Political side of vaccines — The tweet expresses concerns that the govern- ments/politicians are pushing their own agenda through the vaccines." • Country - "Country of origin — The tweet is against some vaccine because of the country where it was developed/manufactured" • Rushed - "Untested / Rushed Process — The tweet expresses concerns that the vaccines have not been tested properly or that the published data is inaccurate." • Ingredients - "Vaccine Ingredients/technology — The tweet expresses concerns about the ingredients present in the vaccines (e.g., fetal cells, chemicals) or the technology used (e.g., mRNA vaccines can change your DNA)" • Side-effects - "Side Effects / Deaths — The tweet expresses concerns about the side effects of the vaccines, including deaths caused." • Ineffective - "Vaccine is Ineffective — The tweet expresses concerns that the vaccines are not effective enough and are useless." • Religious - "Religious Reasons — The tweet is against vaccines because of religious reasons" • None - "No specific reason stated in the tweet, or some reason other than the given ones." Given below are a few examples of tweets along with their labels: • "FYI....there are plenty of people walking around without vaccines for all sorts of conta- gious diseases/viruses. Why is Covid so different? We must ask why a mandatory vaccine card is even a consideration if the ones who are vaccinated feel that it protects them." : Mandatory Unnecessary • "So there have been issues, but FDA are so desperate they deny it’s the vaccine FDA reports facial paralysis in 4 volunteers for Pfizer’s Covid-19 vaccine, but FDA denies vaccine is the cause - Business Line https://t.co/nD8gwuxbvu" : side-effects • "If this is seen as the deadliest disease in our lifetimes, and consequently the vaccine viewed as a miraculous panacea, why is Pfizer’s stock price virtually unchanged from the beginning of the year?" : pharma • "@MelanieMetz6 @XSOmegaMkII Inovio...look it up as well as Moderna. All 3 of these delivery methods have nano technology that can deliver DNA/RNA gene coding & mutation. This isnt a joke or up for speculation....its way beyond that now. I have leukaemia with 17q deletion So No." : side-effect ingredients conspiracy • "Doctors Around the World Issue Dire WARNING: DO NOT GET THE COVID VACCINE!! https://t.co/JD5mlPTbVt via @Prepare_Change" : none • "This is the same CEO that sold 60+% of his stock in Pfizer on the day of the vaccine announcement. Sell the news, don’t take the vaccine, he seems super bullish on the long term successful prospects if this vaccine. https://t.co/m5dS8Y9Q9t" : ineffective pharma 3. Related Work Users express their opinions regarding healthcare, diseases, treatments, vaccines, and immu- nization campaigns on microblogs like Twitter. In social computing, information extraction from these text-based tweets is increasingly popular. Classical machine learning techniques such as linear classifiers, Naive-Bayes classifiers, support vector machines, and deep neural tech- niques such as Long Short Term Memory(LSTMs) [3], Bidirectional RNN [4], BERT(Bidirectional Encoder Representations from Transformers) [5], and RoBERTa [6]. For natural language pro- cessing, more modern language models include large pre-trained models like T5 [7], GPT3 [8], LLaMA [9], PALM [10], and many more. 3.1. BERT BERT [5] is a highly effective transformer-based architecture that adapts well to numerous tasks involving natural language processing. BERT allows for the pre-training of deep bidirectional representations from unlabeled text, which preserves more of the context and logical flow of the text. The model is pre-trained using next-sentence prediction (NSP) tasks and Masked Language Modelling (MLM). By including an additional output layer and achieving cutting-edge performance, the BERT model may be fine-tuned for a variety of jobs. 3.2. LLaMA The Large Language Model Meta AI [9], abbreviated as LLaMA, represents a significant ad- vancement in the realm of natural language processing. This collection of state-of-the-art foundation language models spans a spectrum of sizes, ranging from 7 billion to 65 billion parameters. What sets LLaMA apart is its ability to deliver exceptional performance while maintaining a comparatively smaller model size, thereby reducing the computational demands typically associated with cutting-edge language models. LLaMA’s foundation models have been meticulously trained on a diverse and extensive range of unlabeled datasets. This training corpus includes data from sources such as CommonCrawl, C4, GitHub, Wikipedia, books, ArXiv, StackExchange, and more. The amalgamation of these varied datasets has empowered LLaMA to attain state-of-the-art performance, rivaling other top-performing models like Chinchilla-70B [11] and PaLM-540B [10]. 4. Dataset This work uses a training dataset created as part of the research project "CAVES: A dataset to facilitate explainable classification and summarization of concerns towards COVID-19 vaccines." [12]. This carefully managed training dataset includes a sizable corpus of 9,921 tweets criticizing the COVID-19 vaccination. These tweets were collected between 2020 and 2021 and have undergone meticulous manual annotation by subject-matter specialists. The issue categories in our research objectives have been carefully assigned to each tweet in this dataset. To assess the generalizability and robustness of our classification system, the test set encompasses approximately 500 tweets obtained from diverse sources. These tweets are not exclusively centered on COVID-19 vaccines; they span a broader spectrum, incorporating discussions on other vaccine types, such as the MMR and the flu. 5. Pre-processing In line with prior research [13, 14], we conducted extensive pre-processing of the tweet data to enhance the quality of word embeddings. Tweets inherently feature unique lexicons, including elements such as HASHTAGS, @USER mentions, HTTP-URLs, and EMOJIS. These elements often introduce noise if left unattended and adversely affect model performance. Therefore, we implemented a comprehensive data-cleaning pipeline as part of our tweet pre-processing procedure, encompassing the following key steps: • Stop Word Removal: To streamline the text and emphasize essential information, we eliminated common stop words such as "the," "a," "an," and "in." These words typically do not contribute significant meaning to the text. • Lowercasing: Given the informal nature of tweets, we converted all words to lowercase. This practice standardizes the text and ensures that each word is represented consistently, facilitating more effective text analysis. • Emoticon Conversion: Emojis are frequently employed on Twitter to express emotions and sentiments. Recognizing their importance, we refrained from outright removal and instead converted emojis to their corresponding textual representations. This transfor- mation retained the sentiment and emotional context of the text. The ’emoji’ library (https://pypi.org/project/emoji/) aided in this process. • Contractions Expansion: We systematically expanded contractions to their original, uncontracted forms to promote text standardization. For example, "don’t" was expanded to "do not." We expanded this expansion by leveraging the ’contractions’ library( https: //pypi.org/project/contractions/). • Non-Alphanumeric Character Removal: Extraneous non-letter characters, including brackets, colons, semi-colons, @ symbols, and the like, were removed from the text. This step contributed to text cleanliness and coherence. • URL Removal: URLs unrelated to sentiment analysis were purged from the text using regular expressions. This exclusion aided in focusing the analysis on the textual content pertinent to sentiment assessment. 6. Methodology • Run1: COVID-Twitter-BERT (CT-BERT): We used a domain-specific transformer- based model called CT-BERT[15]. We chose this model specifically because BERT-Large is trained on Wikipedia data, and using a pre-trained model in the same domain, in this case, COVID-19-related tweets would give more significant results after fine-tuning with the provided training data. We shuffled the training data, then split it into training and validation sets in the ratio 80:20 such that the percentage of instances of each class was preserved in both sets. Both training and validation instances were pre-processed, as explained in section 5. The resulting training data was used for fine-tuning CT-BERT[15] while validation data was used for evaluation. We trained the model for 15 epochs with a learning rate of 2e-5. The test data was also pre-processed using the same steps as training and validation data first to generate the embeddings for the tweet and then predict the probability scores of each tweet against all the classes. We used the sigmoid function over probability values with a threshold of 0.5 to predict the label. The final predic- tion file containing the Tweet ID and the predicted class was submitted as run1 for the task. • Run2: OpenLLaMA-7B: We use the OpenLLaMA-7B [9] model variant to finetune for the task at hand. The training data instances were pre-processed, as explained in section 5. Following the same methodology as given in [16], we use the Prefix Tuning technique, which falls into the larger category of PEFT (Parameter Efficient Fine Tuning) approaches. In this, we learn a set of adaption prompt tokens, which is appended at the beginning of some top-N transformer layers. While finetuning, only these prompt tokens are finetuned for a specific downstream task, while the rest of the LLM parameters remain frozen. Also, a zero-initialized, zero-gated attention mechanism is used to inject the finetuned prompt token knowledge into the existing model so that the original model parameters don’t deviate too much due to noise in the initial learning phase. We use 10 extra learnable prompt tokens in our setting and append them to the top 30 transformer layers. This adds only an extra 1.2M parameters over the existing 7B frozen parameters, requiring 10 minutes to train for 9,921 data points using batch size 4, 512 as max sequence length for 5 epochs using a learning rate of 9e-3. We generate the classification labels using the pre-processed test data as a text generation task. • Run3: OpenLLaMA-7B: In run 3, we fine-tuned the OpenLLaMA-7B model variant for multi-label classification without pre-processing the tweets. All other details of model training are similar to the Run2. In this run, we generate the classification labels using the raw test data as a text generation task. 7. Evaluation AISoMe Track results are evaluated using the macro-F1 score and Jaccard Score. The result of our three submitted runs for the task is shown in Table 1. Sr No. Team_ID macro-F1 score Jaccard Score Rank Run 1 DSIRC 0.67 0.7 4 Run 2 DSIRC 0.57 0.61 13 Run 3 DSIRC 0.55 0.6 16 Table 1 Result of AISoME Track Task 8. Conclusion and Future Work This study employs Covid-Twitter-BERT and Open-LLaMA-7B to categorize vaccination-related tweets. The transformer-based model outperforms the fine-tuned OpenLLaMA-7B-based clas- sifiers because its word embeddings are more expressive and yield better results on test data. Furthermore, because transformer-based models require many data, we recommend looking at data augmentation solutions to improve the performance of our model. Another aspect would be to train the model to become more robust against adversaries. References [1] S. Poddar, A. M. Samad, R. Mukherjee, N. Ganguly, S. Ghosh, Caves: A dataset to facilitate explainable classification and summarization of concerns towards covid vaccines, in: Pro- ceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2022, pp. 3154–3164. [2] S. Poddar, M. Basu, K. Ghosh, S. Ghosh, Overview of the fire 2023 track:artificial intelligence on social media (aisome), in: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation, 2023. [3] K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, J. Schmidhuber, Lstm: A search space odyssey, IEEE Transactions on Neural Networks and Learning Systems 28 (2015) 2222–2232. URL: https://api.semanticscholar.org/CorpusID:3356463. [4] D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning internal representations by error propagation, 1986. URL: https://api.semanticscholar.org/CorpusID:62245742. [5] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: Pre-training of deep bidirectional transformers for language understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, 2019. URL: https://aclanthology.org/N19-1423. doi:10. 18653/v1/N19-1423. [6] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoy- anov, Roberta: A robustly optimized bert pretraining approach, 2019. arXiv:1907.11692. [7] C. Raffel, N. M. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu, Exploring the limits of transfer learning with a unified text-to-text transformer, J. Mach. Learn. Res. 21 (2019) 140:1–140:67. URL: https://api.semanticscholar.org/CorpusID: 204838007. [8] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, D. Amodei, Language models are few-shot learners, 2020. arXiv:2005.14165. [9] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, G. Lample, Llama: Open and efficient foundation language models, ArXiv abs/2302.13971 (2023). URL: https: //api.semanticscholar.org/CorpusID:257219404. [10] A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, P. Schuh, K. Shi, S. Tsvyashchenko, J. Maynez, A. Rao, P. Barnes, Y. Tay, N. M. Shazeer, V. Prabhakaran, E. Reif, N. Du, B. C. Hutchinson, R. Pope, J. Bradbury, J. Austin, M. Isard, G. Gur-Ari, P. Yin, T. Duke, A. Levskaya, S. Ghemawat, S. Dev, H. Michalewski, X. García, V. Misra, K. Robinson, L. Fedus, D. Zhou, D. Ippolito, D. Luan, H. Lim, B. Zoph, A. Spiridonov, R. Sepassi, D. Dohan, S. Agrawal, M. Omernick, A. M. Dai, T. S. Pillai, M. Pellat, A. Lewkowycz, E. Moreira, R. Child, O. Polozov, K. Lee, Z. Zhou, X. Wang, B. Saeta, M. Díaz, O. Firat, M. Catasta, J. Wei, K. S. Meier-Hellstern, D. Eck, J. Dean, S. Petrov, N. Fiedel, Palm: Scaling language modeling with pathways, ArXiv abs/2204.02311 (2022). URL: https://api.semanticscholar.org/CorpusID:247951931. [11] J. Hoffmann, S. Borgeaud, A. Mensch, E. Buchatskaya, T. Cai, E. Rutherford, D. de Las Casas, L. A. Hendricks, J. Welbl, A. Clark, T. Hennigan, E. Noland, K. Millican, G. van den Driessche, B. Damoc, A. Guy, S. Osindero, K. Simonyan, E. Elsen, J. W. Rae, O. Vinyals, L. Sifre, Training compute-optimal large language models, ArXiv abs/2203.15556 (2022). URL: https://api.semanticscholar.org/CorpusID:247778764. [12] S. Poddar, A. M. Samad, R. Mukherjee, N. Ganguly, S. Ghosh, Caves: A dataset to facil- itate explainable classification and summarization of concerns towards covid vaccines, in: Proceedings of the 45th International ACM SIGIR Conference on Research and De- velopment in Information Retrieval, SIGIR ’22, Association for Computing Machinery, New York, NY, USA, 2022, p. 3154–3164. URL: https://doi.org/10.1145/3477495.3531745. doi:10.1145/3477495.3531745. [13] S. Bithel, S. S. Malagi, Unsupervised identification of relevant prior cases, 2021. arXiv:2107.08973. [14] S. Bithel, Ctc: Covid-19 tweet classification using ct-bert (2022). [15] M. Müller, M. Salathé, P. E. Kummervold, Covid-twitter-bert: A natural language processing model to analyse COVID-19 content on twitter, CoRR abs/2005.07503 (2020). URL: https: //arxiv.org/abs/2005.07503. arXiv:2005.07503. [16] R. Zhang, J. Han, C. Liu, P. Gao, A. Zhou, X. Hu, S. Yan, P. Lu, H. Li, Y. Qiao, Llama-adapter: Efficient fine-tuning of language models with zero-init attention, 2023. arXiv:2303.16199.