BERT-Powered Multi-label Classifier: Analyzing Public COVID Vaccination Discourse Ranjit Patro1 , Asutosh Mishra2 1 Indian Institute of Science Education and Research Berhampur, Odisha, India 2 Indian Institute of Science Education and Research Berhampur, Odisha, India Abstract In response to the pressing need to understand public sentiment against vaccination in the digital era, this study employs social media data to build a multi-label, multi-class classifier. To accomplish effective label prediction, a pre-trained BERT model is used in conjunction with certain preprocessing techniques, and a brute-force threshold selection strategy is implemented. This research sheds light on the complex terrain of vaccination opinion by analyzing a wide range of concerns, ranging from the safety of vaccines to their potential political and religious ramifications. This study is part of the Artificial Intelligence on Social Media (AISoMe) track of the Forum for Information Retrieval Evaluation (FIRE) 2023 conference. The evaluation of the test dataset shows that the test score of 0.7 is meaningful. Keywords Multi-Label Classifier, COVID-19 Vaccine Tweets, COVID-Twitter-BERT 1. Introduction Vaccination has historically served as a fundamental pillar of public health, providing protection to communities against the grave threats posed by deadly diseases. In the present-day globalized society, the discussion pertaining to vaccines has gained significant traction on social media platforms. The COVID-19 pandemic has presented unique and unparalleled challenges, leading to increased attention on the crucial importance of immunization in protecting public health. Nevertheless, the current period is characterized not alone by significant scientific progress, but also by an increasing sense of doubt regarding vaccines. The discourse surrounding vaccines on social media encompasses a wide range of perspectives and concerns, resulting in a complex and multifaceted narrative. Utilizing data from social media platforms, this study intends to investigate the complex landscape. Our primary objective is to develop a robust multi-label, multi-class classifier capable of effectively categorizing social media conversations, specifically tweets, by accurately identifying the specific vaccine-related concerns mentioned by the authors. The study also examines the complex nature of vaccine skepticism, which encompasses various aspects such as concerns regarding effectiveness, safety, the role of the pharmaceutical sector, and the broader socio-political and cultural factors. Forum for Information Retrieval Evaluation, December 15-18, 2023, India $ ranjitp20@iiserbpr.ac.in (R. Patro); asutosh21@iiserbpr.ac.in (A. Mishra) € https://ranjitpatro.netlify.com/ (R. Patro); https://asutosh-mishra.netlify.com/ (A. Mishra)  0009-0006-4701-7098 (R. Patro); 0009-0003-5376-3431 (A. Mishra) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings The value of this work extends beyond its timely reaction to the current vaccine controversy, as it also holds the potential to provide valuable insights for informing public health measures. Through an in-depth analysis of the intricate concerns and beliefs that underpin vaccine reluctance, this study provides valuable insights about vaccination sentiment. This technology facilitates the understanding and analysis of public opinion towards vaccines in the era of digital communication, so promoting the development of more informed public health. 2. Task Under the Artificial Intelligence on Social Media (AISoMe) track [1], in this paper, we introduce an effective approach to address the challenge of constructing a robust multi-label, multi-class classifier for categorizing social media posts, specifically tweets, based on the various concerns expressed by the authors regarding vaccines. Our classification task involves assigning the most appropriate label(s) to each tweet from a set of potential concerns associated with vaccines. These concerns encompass a wide spectrum of opinions and viewpoints in the discourse around vaccines, acting as the label of classification. They are: • Unnecessary: The tweet indicates vaccines are unnecessary, or that alternate cures are better. • Mandatory: Against mandatory vaccination — The tweet suggests that vaccines should not be made mandatory. • Pharma: Against Big Pharma — The tweet indicates that the Big Pharmaceutical com- panies are just trying to earn money, or the tweet is against such companies in general because of their history. • Conspiracy: Deeper Conspiracy — The tweet suggests some deeper conspiracy, and not just that Big Pharma wants to make money (e.g. vaccines are being used to track people, COVID is a hoax) • Political: Political side of vaccines — The tweet expresses concerns that the governments or politicians are pushing their own agenda through vaccines. • Country: Country of origin — The tweet is against some vaccine because of the country where it was developed or manufactured. • Rushed: Untested or Rushed Process — The tweet expresses concerns that the vaccines have not been tested properly or that the published data is not accurate. • Ingredients: Vaccine Ingredients or technology — The tweet expresses concerns about the ingredients present in the vaccines (e.g. fetal cells, chemicals) or the technology used (e.g. mRNA vaccines can change your DNA) • Side-effect: Side Effects or Deaths — The tweet expresses concerns about the side effects of the vaccines, including deaths caused. • Ineffective: Vaccine is Ineffective — The tweet expresses concerns that vaccines are not effective enough and are useless. • Religious: Religious Reasons — The tweet is against vaccines because of religious reasons • None: No specific reason stated in the tweet, or some reason other than the given ones. A single tweet can encompass one or multiple distinct concerns regarding vaccine viewpoints, as demonstrated by the following examples: • “It begins. Please find safe alternatives to this vaccine. UK issues allergy warning about Pfizer COVID-19 vaccine after patients fall ill https://t.co/JEHgCLGIbv via @nypost”; Labels: side-effect • “@BorisJohnson @CMO_England @MattHancock THIS IS BEYOND INCOMPETENCE People should refuse this vaccine as it’s not going to be administered correctly. Only a matter of time before Covid19 kills one of your patients who have had only a fraction of intended protection!”; Labels: none • “Dare I suggest something more sinister with Johnson suggesting a vaccine passport aimed at the reopening of pubs would force the young to seek out the vaccine. A vaccine that is experimental offers the individual no protection from getting or passing on the virus”; Labels: mandatory, ineffective • “jeffmcnamee @padakitty @Amanda77197114 @alexanderchee BREAKING: FDA an- nounces 2 deaths of Pfizer vaccine trial participants from “serious adverse events. textquotedbl Fed Up Democrats Say NO to Forced Vaccines in NY”; Labels: side-effect, mandatory, political 3. Related Work Today, users of microblogs such as Twitter contribute a wide variety of content, including their ideas and feelings in relation to topics such as the coronavirus, COVID-19 immunizations, and vaccination campaigns. Extracting meaningful information from textual tweets has become an integral aspect of social computing. Text classification in particular has been successful through the use of a variety of methods, which range from more conventional machine learning techniques such as Naive-Bayes, Linear classifiers, and Support Vector Machines to more advanced deep learning approaches such as Long Short Term Memory (LSTM) and Bidirectional Recurrent Neural Networks. In addition, modern language models, such as BERT (Bidirectional Encoder Representations from Transformers) [2], as well as its domain-specific variations, such as CT-BERT (COVID-Twitter-BERT) [3], and the improved CT-BERT-V2 (COVID-Twitter-BERT- V2), exemplify the cutting-edge advancements in natural language processing. 3.1. BERT The bidirectional contextual comprehension of BERT (Bidirectional Encoder Representations from Transformers) allows it to analyze and capture the nuanced details in textual input, making it a key Natural Language Processing (NLP) paradigm. Pre-trained on vast unlabeled text corpora and subjected to pre-training tasks like Masked Language Modeling and Next Sentence Prediction, BERT acquires profound linguistic understanding. Furthermore, the effectiveness of this approach is emphasized by its flexibility to be easily adjusted for specific NLP downstream tasks by incorporating layers tailored to those tasks. This consistently leads to achieving the best results in a wide range of applications. 4. Dataset In this study, we utilize a comprehensive train and test dataset provided under the AISoME track. This meticulously curated train dataset encompasses a corpus of 9,921 anti-vaccine tweets focused on COVID vaccines, originally posted on twitter during the period spanning 2020-21. [4] Notably, each of these tweets has been diligently annotated by human experts, associating them with one or more of the aforementioned labels to facilitate fine-grained analysis. Subsequently, the train dataset comprises the annotated tweets, accompanied by their corresponding tweet IDs and assigned labels. It is imperative to note that a single tweet may exhibit multiple labels, reflecting the multifaceted nature of vaccine-related concerns. Similarly, The test dataset featuring 486 tweets, furnished with tweet IDs, which, while unlabelled, encompass discussions concerning a spectrum of vaccines, extending beyond COVID vaccines to encompass other vaccine types such as the MMR vaccine and the Flu vaccine. This whole dataset serves as the foundation for our research endeavors, enabling nuanced exploration and analysis of diverse vaccine-related concerns. 5. Pre-processing The tweets within the provided dataset exhibit a diverse range of unique lexicons, including elements such as ’@username’ mentions, http-urls, hashtags, and special characters like emojis. While these elements may convey contextual information in certain contexts, they introduce noise into the dataset, potentially hindering the overall performance of the model, particularly in the context of our study. To ensure the integrity and effectiveness of our analysis, we devised a systematic data-cleaning pipeline as an integral component of our pre-processing procedure. This pipeline entails a series of procedural steps designed to enhance the quality of the dataset by mitigating the impact of extraneous elements. These steps include: • Conversion to Lowercase: We first convert all sentences to lowercase. This uniform casing ensures that the analysis operates consistently on words and phrases, effectively reducing potential inconsistencies arising from varying letter cases. • Removal of Non-Alphanumeric Characters: Leveraging Python’s regular expression library, we eliminate all non-alphanumeric characters from the text. This step serves to mitigate noise introduced by non-textual elements. • Elimination of URLs: Given the presence of URLs in the raw tweet data, we employ regular expressions to systematically remove them. Specifically, any word matching the pattern beginning with "http" and followed by one or more non-whitespace characters is excised, effectively eradicating URLs from the dataset. • Exclusion of Usernames: Twitter utilizes the ’@username’ format to mention specific individuals within tweets. Recognizing this, we remove all usernames by identifying words containing the character ’@’ and subsequently omitting them from the text. • Stopword Removal: To prioritize meaningful information, we also implement the removal of stopwords—commonly occurring words like "the," "a," "an," and "in"—that typically contribute limited substantive value to the analysis. 6. Methodology 6.1. Model CT-BERT-V2 is a transformer-based language model for Twitter discourse analysis during the COVID-19 pandemic. It stands out for its domain-specific focus, having been pre-trained on a substantial corpus of tweets containing relevant keywords such as "wuhan", "ncov", "coron- avirus", "covid" and "sars-cov-2" posted from January 12 to July 5, 2020. CT-BERT-V2 is initialized with BERT-Large and fine-tuned on over 97 million tweets and 1.2 billion training samples. This customised method allows CT-BERT-V2 to decipher COVID-related Twitter discussions, making it a powerful tool for assessing pandemic data. Now, for our work using the given dataset, to improve model robustness and mitigate overfitting, we introduce a dropout layer with a 0.5 probability atop the pre-trained CT-BERT-V2. Subsequently, a linear transformation layer is incorporated, mapping the 1024-dimensional embeddings from BERT-Large to a final 11-dimensional output logit. These logits form the basis for loss calculation and are further processed through a sigmoid activation function, yielding the ultimate output probabilities for all 11 labels excluding the "None" label. 6.2. Experimental Setup In our experimental setup, after pre-processing, we employed a crucial data transformation step involving the use of the MultiLabelBinarizer provided by sklearn, which facilitated the process of binarizing the training data through One Hot Encoding Method. During this transformation, we excluded the ’none’ column, recognizing it as a dependent label only present when all other labels are absent. This deliberate approach, known as handling the dummy variable trap, was essential in ensuring the accuracy of our results. As a result, the final dataset consisted of 11 labels, with labels represented as "none" when the corresponding tweet provides a value of 0 for all other labels. Following this transformation, we meticulously partitioned the dataset into distinct training and validation sets, while preserving the distribution of class instances within each set and maintaining a balanced representation. For the fine-tuning of CT-BERT-V2, we exclusively employed the pre-processed training data, while the validation data played a pivotal role in evaluating model performance. To calculate the loss, we utilized the logits generated by the model, applying the Binary Cross-Entropy with Logits loss function. These logits were then subjected to a sigmoid function, producing 11 normalized values, each corresponding to the probability of a specific label as output. This well-defined experimental pipeline served as the foundation for our comprehensive evaluation and analysis1 . 6.3. Prediction In this phase, we faced the challenge of predicting tweets having multiple labels. To fix this, we used a threshold mechanism to return predictions for all 11 labels with probabilities above a certain threshold. The ’none’ label was chosen when none of the 11 labels exceeded this criterion. 1 GitHub link to the work: https://github.com/Ranjit246/AISoME_FIRE_2023 To optimize this threshold-setting process effectively, we adopted a greedy approach. Recog- nizing the variations in sigmoid output due to the presence of imbalanced training data, we chose to employ distinct thresholds for each of the 11 labels. This necessitated the generation of threshold values through an exhaustive exploration of 100 threshold values per label, generated using numpy linspace within the range of 0 to 1. The threshold selected for each label was determined based on the maximum macro-F1 score achieved for that specific label during the assessment on a randomized validation split. This iterative process was replicated across all 11 labels, resulting in a comprehensive list of 11 "best thresholds". Subsequently, these thresholds underwent validation on different splits and were further fine-tuned to yield the final list of threshold values. Ultimately, classes with probabilities greater than or equal to their correspond- ing class threshold were identified as the final predicted classes, ensuring a precise and robust multi-label classification outcome. 7. Evaluation The assessment of results in the AISoMe Track is conducted using the widely adopted standard classification metric, the Macro-F1 score, encompassing 12 distinct classes. In Table 1, we present the outcome of our submission for the task. Remarkably, our prepared model achieved a Macro- F1 score of 0.7 and an equally commendable Jaccard score of 0.71. This stellar performance underscores the effectiveness and competence of our approach in addressing the complex challenges posed by multi-label classification in the context of the AISoMe Track. Table 1 Team IISERBPR-NLP, Result of AISoME track Run File Summary of Methodology Macro-F1 Jaccard Rank submission-bert.csv fine tune BERT with best threshold 0.7 0.71 3 8. Conclusion and Future Work In our study, We used Covid-Twitter-BERT, a transformer-based model pre-trained on a large corpus of COVID-19-related tweets, to efficiently assign vaccine-related labels for a challenging multi-label and multi-class task. Transformer-based models are data-hungry, thus we plan to investigate data augmentation ways to improve our model. We also consider using adversarial training to strengthen our model. Our approach will be improved by these future research paths, allowing us to better analyze vaccine concerns in social media discussions. In addition, we plan to expand our dataset sources to include more social media platforms, optimize our model through fine-tuning and hyperparameter tuning, prioritize model explainability, consider real-time monitoring capabilities, address ethical concerns, and foster collaborative partnerships for domain-specific insights to enable practical vaccine applications. References [1] S. Poddar, M. Basu, K. Ghosh, S. Ghosh, Overview of the fire 2023 track:artificial intelligence on social media (aisome), in: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation, 2023. [2] J. Devlin, M. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional trans- formers for language understanding, CoRR abs/1810.04805 (2018). URL: http://arxiv.org/ abs/1810.04805. [3] M. Müller, M. Salathé, P. E. Kummervold, Covid-twitter-bert: A natural language processing model to analyse COVID-19 content on twitter, CoRR abs/2005.07503 (2020). URL: https: //arxiv.org/abs/2005.07503. [4] S. Poddar, A. M. Samad, R. Mukherjee, N. Ganguly, S. Ghosh, Caves: A dataset to facilitate explainable classification and summarization of concerns towards covid vaccines, in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2022, pp. 3154–3164.