Multi-label Classification of Covid-19 Vaccine Tweet Palvika Bansal1 , Sumit Das1 , Vikas Rai1 and Shalini Kumari1 1 Thomson Reuters Lab, Bangalore, India Abstract This research paper presents a novel approach to multi-label classification of tweets expressing concerns about Covid-19 vaccines. It introduces fine-tuned BERT based model, customized for this task, which achieves good performance in accurately categorizing specific concerns within tweets. Through extensive data preprocessing, the model accommodates a wide range of concerns. Our findings have significant implications for public health communication, as they enable precise monitoring of public sentiment and vaccine-related concerns. This research contributes to natural language processing and demonstrates the practical application of advanced machine learning techniques in addressing real-world challenges. It underscores the potential for innovative AI-driven solutions in public health communication. Keywords COVID-19 Vaccine Tweets, Sentiment Analysis, Multi label Classification, BERT, Prefix-Tuning 1. Introduction Vaccination plays a crucial role in mitigating the risk and transmission of a wide range of diseases. Over the past few years, vaccination has emerged as a critical tool in combating the COVID-19 pandemic. Moreover, large-scale vaccination efforts are essential to reduce the prevalence of various diseases. Nonetheless, skepticism towards vaccines persists among many individuals, primarily due to a variety of reasons, including political factors and concerns about potential vaccine side effects. It is imperative to acknowledge and address these diverse concerns surrounding vaccines. Social media platforms have proven to be invaluable sources of data for gauging public sentiment and opinions regarding vaccination. Leveraging platforms like these allows us to rapidly gather insights from conversations and discussions about vaccines [1]. To facilitate this understanding, our work has utilized training data sourced from a prior project called ”CAVES: A dataset designed to facilitate the transparent classification and summarization of concerns related to COVID vaccines.” [2]. Our rigorous methodology entailed a systematic experimentation with a wide spectrum of techniques in the realms of deep learning and machine learning. We experimented with these approaches to facilitate the precise categorization of tweets that revolved around vaccine- related concerns. Within our experimental framework, we started with foundational models including TF-IDF and LSTM and advanced towards more contextual models which involved BERT [3] based models. One noteworthy experimentation involved the implementation of prefix Forum for Information Retrieval Evaluation, December 15–18, 2023, India Envelope-Open palvika.bansal@thomsonreuters.com (P. Bansal); sumit.das@thomsonreuters.com (S. Das); vikas.rai@thomsonreuters.com (V. Rai); shalini.kumari@thomsonreuters.com (S. Kumari) © 2023 Author:Pleasefillinthe\copyrightclause macro CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings tuning [4], a refinement technique integrated with state-of-the-art transformer models. This intricate synergy enabled us to extract nuanced insights from the tweets under examination, enhancing the accuracy and depth of our classification efforts. To further extract the contextual meaning of tweet, we experimented with various data processing approaches such as identifying named entities in the tweets, expansion of tweets, analyzing sentiment of tweet and analysis of keywords in the tweets. We also experimented with state of the art GPT-4 [5] model to identify concerns related to the tweet by providing it with few-shot examples. Furthermore, our investigative pursuits were not confined solely to the broad spectrum of techniques. We ventured into the specialized domain of model fine-tuning to accommodate the idiosyncrasies inherent in tweet data. This approach allowed us to harness the unique characteristics of Twitter’s concise and informal language style, ensuring our models were finely attuned to capture the subtle intricacies of vaccine-related discourse. 2. Task Our primary aim is to develop a highly efficient multi-label classification model that can accurately assign labels to a social media post, specifically tweets. These labels will correspond to the specific concerns and sentiments expressed by the post’s author regarding vaccines. This task involves not only identifying the presence of various concerns but also understanding the nuances and context in which they are discussed, enabling a comprehensive analysis of public sentiment and discourse surrounding vaccines on social media platforms. In the context of this study, the classification task is centered around a set of predefined concerns pertaining to vaccines. These concerns serve as the labels for categorizing social media (tweet) posts, providing a structured framework for analyzing and understanding public discourse on vaccine-related topics. To gain deeper insights, kindly refer to the following topics: • Unnecessary: The tweet indicates vaccines are unnecessary, or that alternate cures are better. • Mandatory: Against mandatory vaccination — The tweet suggests that vaccines should not be made mandatory. • Pharma: Against Big Pharma — The tweet indicates that the Big Pharmaceutical com- panies are just trying to earn money, or the tweet is against such companies in general because of their history. • Conspiracy: Deeper Conspiracy — The tweet suggests some deeper conspiracy, and not just that the Big Pharma want to make money (e.g., vaccines are being used to track people, COVID is a hoax) • Political: Political side of vaccines — The tweet expresses concerns that the govern- ments/politicians are pushing their own agenda though the vaccines. • Country: Country of origin — The tweet is against some vaccine because of the country where it was developed/manufactured • Rushed: Untested/Rushed Process — The tweet expresses concerns that the vaccines have not been tested properly or that the published data is not accurate. • Ingredients: Vaccine Ingredients/technology — The tweet expresses concerns about the ingredients present in the vaccines (eg. fetal cells, chemicals) or the technology used (e.g., mRNA vaccines can change your DNA) • Side-effect: Side Effects/Deaths — The tweet expresses concerns about the side effects of the vaccines, including deaths caused. • Ineffective: Vaccine is ineffective — The tweet expresses concerns that the vaccines are not effective enough and are useless. • Religious: Religious Reasons — The tweet is against vaccines because of religious reasons • None: No specific reason stated in the tweet, or some reason other than the given ones. 3. Related Work Users frequently turn to micro-blogging platforms such as Twitter, motivated by a diverse range of objectives. These include expressing their viewpoints on the Coronavirus pandemic, disseminating personal health updates to their online connections, flagging symptoms, and sharing alerts regarding their well-being or that of acquaintances. Robust discussions take place concerning COVID-19 vaccines and vaccination campaigns, often preceding individuals’ receipt of their vaccine doses. The extraction of valuable insights from these textual tweets represents a common application within the field of social computing. In the realm of text classification, traditional machine learning techniques such as the Naive- Bayes classifier, Linear classifier, Support Vector Machine (SVM), and cutting-edge deep learning methods including Long Short Term Memory (LSTM) networks and Bidirectional Recurrent Neural Networks (RNNs) have demonstrated their effectiveness. Recent advancements in natural language processing have given rise to notable language models, with BERT (Bidirectional Encoder Representations from Transformers) [3] and its domain-specific counterpart CT-BERT (COVID-Twitter-BERT) [6] at the forefront. Additionally, VaccineBERT [7], a BERT-based model specialized in classifying COVID-19 vaccine-related tweets, has garnered attention. 4. Dataset The dataset in its entirety consists of 9,921 tweets records, and it is worth noting that there are no missing values within this dataset, ensuring a comprehensive and complete collection of Twitter data for analysis. 4.1. Data Exploration Within the scope of this classification task, it is imperative to acknowledge that individual tweets may be linked with multiple labels. Consequently, it is of utmost importance to undertake a comprehensive examination of the distribution of these labels within the dataset. This understanding is vital for effectively categorizing and interpreting the complex and diverse nature of the tweets in our dataset.For an in-depth analysis and a complete overview of the results from this analysis, Refer Table 1. Table 1 Label distribution within dataset Label Count side-effect 3805 ineffective 1672 rushed 1477 pharma 1273 mandatory 783 unnecessary 722 none 629 political 626 conspiracy 487 ingredients 436 country 201 religious 64 In addition to this, it is crucial to examine the distribution of the number of labels assigned to each individual tweet. Upon analyzing the entire dataset, we observed that approximately 7,936 tweet texts were assigned only one label, indicating a prevalent singularity of classification. Furthermore, around 1,716 tweets exhibited a dual-label configuration, suggesting a moderate level of complexity in label assignment. Intriguingly, a subset of 269 tweets challenged this convention by being concurrently linked to three distinct labels, underscoring the presence of intricately categorized content within the dataset. This meticulous examination of label distribution not only enhances our understanding of the dataset’s characteristics but also provides valuable insights into the diverse nature of the classification challenge at hand. Furthermore, we have undertaken an examination of the distribution of tweet lengths. For a more comprehensive view of the length distribution, Refer to the Appendix Figure 1.1. 4.2. Trends in the dataset Label-Entity Mapping in Tweet Text An analysis aimed at mapping training data labels to the most prevalent entity types found within the tweet text. This analysis was carried out for both individual training data labels and when multiple labels were present. For a comprehensive breakdown of this analysis and its results you can refer to Appendix Tables 1.1 and 1.2. Extraction and Parsing of URL-Embedded HTML Content in Tweet Text We performed a two-fold analysis involving the extraction of URLs from tweet text and the subsequent parsing of HTML content from these URLs. The purpose was to examine the HTML content, particularly the headlines, associated with each URL and compare it with the tweet text. It was observed that the majority of these URLs referred to either other tweet threads or news media reports. Among the complete list of URLs, approximately 20% of the web pages were found to be non-existent. In the course of our analysis, we discovered that in most cases, the tweet text was concise and often a partial excerpt from the parsed URL contents. Additionally, there were instances where the context of the tweet text contradicted the information present in the HTML content of the URLs. Consequently, we arrived at the conclusion that incorporating this HTML content into the tweet text would not provide added value and could potentially introduce confusion to the model. Analysis of @Mentioned Users in Tweet Text Furthermore, we conducted an analysis of the mentions of user profiles (@user) within the tweet text. The intention was to explore whether the profiles of mentioned users could offer supplementary information related to the type of tweet. However, it is important to note that our efforts were hindered by the unavailability of data due to restrictions imposed by the Twitter API, which prevented access to user profiles. Exploring Entity Types Within Tweet Text In the course of our research, we leveraged the Hugging Face’s bertweet-tb2_wnut17-ner API as a cornerstone for detecting entities within the tweet texts. This API, tailored for the intricacies of social media data, harnessed the power of advanced Named Entity Recognition (NER) techniques, specifically fine-tuned for Twitter contexts, to accurately categorize entities amid the informality, hashtags, and mentions characteristic of tweets. However, it’s noteworthy that given the constraints of time, our exploration did not yield significant outcomes, warranting further investigation in the future. [ {'entity_group': 'PER', 'score': 0.9401, 'word': 'JoshBloom@@'}, {'entity_group': 'ORG', 'score': 0.8423, 'word': 'Pfizer'} ], [ {'entity_group':'ORG', 'score': 0.9042, 'word': 'Lifesitenews'} ] Label association with tweet length Investigated the Correlation Between Label Assignments and Tweet Length to Explore Potential Label Preferences in Terms of Tweet Length. However, No Direct Relationship Between Label and Tweet Text Length Was Identified. Analysis and its results you can refer to Appendix Table 1.3. Sentiment Analysis Used SentimentIntensityAnalyzer package for a quick sentiment analysis. However, given that all labels were primarily linked to negative emotions, the average sentiment scores across the board leaned toward negativity. Analysis and its results you can refer to Appendix Table 1.4. Determining Key Terms in Tweets for the Primary Entity Group of Each Label Examined the prevalent words associated with each class label, omitting stopwords and applying tweet cleaning. For additional details on this analysis and its outcomes, consult the Appendix Figure 1.2. 5. Pre-processing To enhance the quality of word embeddings that we leveraged in modeling process, we pre- processed the tweets. Tweets generally encompass distinctive lexical elements such as hashtags, @username mentions, URLs, RT and special characters. These elements, if left unprocessed, tend to hinder the model’s performance. Consequently, we implemented a specific data cleansing procedure as an integral component of our tweet pre-processing strategy within the dataset: • Removing stop words: In this phase, stop words, which are commonly used words such as ”the,” ”and,” and ”in,” are systematically removed from the text. We also removed some words specific to tweets data such as rt which depicts retweets. This step helped in reducing noise and improving the efficiency of the tasks by focusing on the most meaningful words and phrases in the text. • Removing URLs: Initially we explored using external URL content to enhance tweet meaning but it didn’t add much value to the core meaning of tweet and was distorting results. So, We removed these extraneous web links using regular expression. • Removing Username mentions: Removing username mentions in tweets analysis data is crucial to preserve privacy and reduce bias, as mentions often refer to specific individuals or accounts. This step ensured that the analysis remains impartial. • Convert words to lowercase: Converting words to lowercase in tweets analysis data standardizes text and enhances consistency, ensuring that words with different capitaliza- tion patterns are treated as identical. This step prevents discrepancies in analysis and simplifies text processing. • Remove non-alphanumeric characters: We removed special symbols, and punctuation marks that often don’t contribute significantly to the analysis. This step helped in focusing on the core linguistic content. • Tweet text expansion: For labels with less data, We utilized GPT-3.5 to augment tweet content for labels such as country, political, conspiracy, religious, and none, in order to provide richer context and enhance the relevance of the tweet in accordance with its label. This initiative aims to assess whether text expansion can contribute to the enhancement of the model’s performance, particularly for these challenging labels. For additional details on this analysis and its outcomes, consult the Appendix Table 1.5. 6. Methodology 6.1. Models Fine Tuning DeBERTa Large: In one of our experiments we finetuned DeBERTa (Decoding- enhanced BERT with Disentangled Attention) [8] ”large” variant. It builds on RoBERTa [9] with disentangled attention and enhanced mask decoder training with half of the data used in RoBERTa. It is a Transformer-based neural language model that aims to improve the BERT [3]and RoBERTa models with two techniques: a disentangled attention mechanism and an enhanced mask decoder. The disentangled attention mechanism is where each word is represented unchanged using two vectors that encode its content and position, respectively, and the attention weights among words are computed using disentangle matrices on their contents and relative positions. The enhanced mask decoder is used to replace the output softmax layer to predict the masked tokens for model pre-training. In addition, a new virtual adversarial training method is used for fine-tuning to improve model’s generalization on downstream tasks. We used max length as 128 with padding to right. We used learning rate ad 2e-5 and batch size of 10 to fine tune model for 15 epochs. To prevent overfitting, We used early stop monitoring the validation loss with patience value 5. Prefix Tuning of RoBERTa Large: In our experiment, we employed the RoBERTa (A Robustly Optimized BERT Pretraining Approach) [9] ”large” variant, which is among the state-of-the-art transformers in the domain of natural language processing. RoBERTa builds on BERT [3] model architecture using a more effective training procedure and was trained on a much larger dataset. This variant is pre-trained on 160GB of text from the BookCorpus, OpenWebText, English Wikipedia etc., making it adept at grasping linguistic nuances and contextual representations of text. We chose prefix tuning [4] for RoBERTa large because it allows us to adapt the pre- trained model for our specific multi-label classification task without overhauling the underlying patterns the model had previously learned. By adding a task-specific prefix to the input sequence, prefix tuning effectively guides the model to tailor its representations for the given task while leveraging the extensive pre-existing knowledge encoded in the model. We kept 128 virtual tokens at the prefix of the prompt and 100 tokens to encode the tweet looking at the distribution of tweet lengths. We used learning rate of 1e-2 and batch size of 8 to fine-tune the model for 15 epochs. We used BCEWithLogitsLoss loss function to suit the multi label classification problem. To prevent overfitting, We used early stop monitoring the validation loss with patience value 5. For this experiment, we selected probability threshold of 0.5 to assign classes above this threshold to any tweet. 6.2. Experimental Setup Our experimental framework was designed to ensure robust model development and evaluation. We started by randomly shuffling the dataset and then splitting it into an 80% training set and a 20% validation set. We pre-processed the training and validation set using the pre-processing steps mentioned in Section 5. Given the nature of tweets with multiple labels, we applied a Multilabel Binarizer to appropriately encode and handle these labels. Additionally, to prevent overfitting, we employed early stopping techniques with configurable parameters. For each experiment, we systematically varied model hyperparameters. Detailed information on these parameters and experiment configurations can be found in the Section 6.1. 6.3. Predictions For the predictions over the final test data provided, we fine-tuned different language based model architectures with the objective of multi label text classification, details of which are mentioned in Section 6.1. We predicted the probability scores of each test tweet against all classes. We also experimented with different probability thresholds to assign classes for different models and selected thresholds based on Macro-F1 performance metric. Classes with probability score greater than the selected threshold were assigned as the predicted classes for that tweet. We also did some post-processing for scenarios where the model was predicting other class labels along with “none” class label, so we removed “none” class label in those scenarios and kept the other predicted class labels as is. Based on our thresholds, there might be a few scenarios, where the model didn’t make any prediction to ensure precise results. We submitted 3 prediction files from different models containing Tweet ID and predicted classes. 6.4. Additional Modeling Experiments In addition to the submitted models, we conducted a series of experiments utilizing diverse feature sets and model architectures. However, these experiments did not yield superior results and were consequently not included in the final submission. This section provides insights into our exploration of alternative approaches, offering valuable context for the chosen model’s selection. BERTweet Large: As the cornerstone of our approach, we selected the BERTweet Large model due to its specialization in processing Twitter data. This model is pre-trained on a massive Twitter corpus, making it adept at capturing the linguistic nuances and contextual intricacies of tweets. BERTweet [10] is the first public large-scale language model pre-trained for English Tweets. BERTweet is trained based on the RoBERTa pre-training procedure. The corpus used to pre-train BERTweet consists of 850M English Tweets (16B word tokens, 80GB), containing 845M Tweets streamed from 01/2012 to 08/2019 and 5M Tweets related to the COVID-19 pandemic. The BERTweet Large model was fine-tuned on our training dataset. During fine-tuning, we optimized model weights to align with the specific multi-label classification task. This step included adjusting model parameters, learning rates, and batch sizes. We used learning rate of 2e-5 and batch size of 10 to fine-tune the model for 10 epochs. We used early stopping threshold of 0.001 for preventing model overfitting. For this experiment, we selected probability threshold of 0.2 to assign classes above this threshold to any tweet. tf-idf vectorizer with Deep Neural Network: After pre-processing the text, we used tf-idf vectorizer to create numerical representations of text features. Then, we used Deep Neural Network model on these features by adding dense layers and also drop out layers to handle overfitting. LSTM with GloVe Twitter Embedding: We did another experimentation by building an LSTM model. We used GloVe Twitter(2B tweets, 27B tokens, 1.2M vocab, uncased, 100d) embedding 1 to create features. Then we used a dropout layer for handling overfitting, an LSTM layer and a Dense layer for building the multi-label classifier. We used sigmoid as the activation function at the output layer, binary cross-entropy as loss. With this experiment we got Macro Average F1 score of 0.296 on the validation set using threshold of 0.2. Experiment with GPT-4: We experimented with GPT-4 [5] to generate labels for tweets in validation set by giving it few shots examples of all the class labels along with system and user prompt, details of which are mentioned in Appendix B. We used temperature of 0 to be more deterministic and top_p of 1.0. We analyzed the results to find that most of the times GPT-4 was predicting at least 2 labels for a tweet, even though our data distribution has majority of the times 1 label for each tweet. Hence, it was significantly lowering the precision of the results. Expanded Tweet Experiment: As mentioned in the pre-processing section, we expanded the tweets for certain classes to improve the performance of those classes. We did prefix tuning of Roberta Large model using expanded tweets for certain classes and normal tweet for other classes in the train set. In the validation set, we didn’t expand tweets to evaluate performance. 1 https://nlp.stanford.edu/projects/glove/ We didn’t see any performance improvement over the prefix tuning of Roberta Large model on normal tweets. Label Enhancement and Similarity Matching: In this experiment we tried to enhance the label by using GPT-3.5 model. After having enhanced labels we calculated its embedding using BERTweet model. In runtime we calculated the cosine similarity of embeddings of enhanced labels and tweets. We noticed the with threshold as 0.8 it was not performing well. 7. Evaluation This task was evaluated using Macro-F1 score on the 12 different classes as metric. The result of our submitted automated runs on test set for this Task is shown in Table 2. Sr No. Team_name Model Details macro-F1 score Jacc score Rank Cognitive DeBERTa Large 1 0.67 0.70 5 Coders Fine-tuning Cognitive RoBERTa Large 2 0.64 0.65 9 Coders Prefix tuning Table 2 Results on submitted test set 8. Conclusion and Future Work In the final evaluation of this study, we conducted fine-tuning experiments using different language models: DeBERTa Large and Prefix Tuning of RoBERTa Large. Our objective was to explore the performance of these models in the context of a complex dataset where none of the labels exhibited a direct correlation with entity, sentiment, length, or word characteristics. Our findings revealed that transformer-based models outperformed traditional classifiers in handling the intricacies of our dataset. This observation underscores the potential of transformer- based architectures in addressing multifaceted classification tasks. Furthermore, we explored different data augmentation strategies, such as utilizing language models (LLM) to expand tweet text and provide additional context with the objective that this approach can potentially enhance model performance, particularly for labels with limited data points, such as religious, country, and ingredients. Increasing the dataset size for these labels may lead to improved classification accuracy, as transformer-based models are known to benefit from larger datasets due to their data-hungry nature. Also, In our research, we employed Hugging Face’s bertweet-tb2_wnut17-ner API to detect entities in tweet texts. This API, specialized for social media data, enhanced our Named Entity Recognition (NER) capabilities. It allowed us to categorize entities effectively in the context of Twitter’s informal language, hashtags, and mentions. This integration could enable comprehensive analyses of label assignments, sentiment, and tweet length, shedding light on the intricate entity-label relationships within our dataset. However, due to time constraints, our exploration yielded limited outcomes, suggesting the need for further investigation in the future. In summary, our study highlights the promising performance of transformer-based models in tackling complex multi-label classification tasks. Additionally, we recommend future research efforts that focus on data augmentation and dataset expansion to further enhance model effectiveness, particularly in scenarios with limited labeled data. References [1] S. Poddar, M. Basu, K. Ghosh, S. Ghosh, Overview of the fire 2023 track:artificial intelligence on social media (aisome), in: Proceedings of the 15th Annual Meeting of the Forum for Information Retrieval Evaluation, 2023. [2] S. Poddar, A. M. Samad, R. Mukherjee, N. Ganguly, S. Ghosh, Caves: A dataset to facilitate explainable classification and summarization of concerns towards covid vaccines, 2022. arXiv:2204.13746 . [3] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding, 2019. arXiv:1810.04805 . [4] X. L. Li, P. Liang, Prefix-tuning: Optimizing continuous prompts for generation, 2021. arXiv:2101.00190 . [5] R. OpenAI, Gpt-4 technical report, arXiv (2023) 2303–08774. [6] M. Müller, M. Salathé, P. E. Kummervold, Covid-twitter-bert: A natural language processing model to analyse covid-19 content on twitter, 2020. arXiv:2005.07503 . [7] S. Bithel, S. Verma, Vaccinebert: Bert for covid-19 vaccine tweet classification, in: Working Notes of FIRE-13th Forum for Information Retrieval Evaluation, FIRE-WN 2021, 2021, pp. 1199–1203. [8] P. He, X. Liu, J. Gao, W. Chen, Deberta: Decoding-enhanced bert with disentangled attention, 2021. arXiv:2006.03654 . [9] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, V. Stoy- anov, Roberta: A robustly optimized bert pretraining approach, 2019. arXiv:1907.11692 . [10] D. Q. Nguyen, T. Vu, A. T. Nguyen, BERTweet: A pre-trained language model for English Tweets, in: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2020, pp. 9–14. A. Data Exploration and Observations In this section, we delve into an extensive exploration of our data, unveiling key insights across various dimensions. Specifically, we scrutinize tweet length, dissect word frequency patterns within each label category, extract entities relevant to each label, investigate the correlations between label assignments and tweet length, and conduct a thorough analysis of original versus expanded tweets. The subsequent subsections provide a comprehensive account of these analyses and observations. Figure 1.1: Tweet lengths analysis Figure 1.2: Word Frequency Analysis for label Table 1.1 Table 1.2 Map Labels with Entity Types (all labels) Map Labels with Entity Types (unique labels) Labels Top Entity Group Labels Top Entity Group conspiracy [ORG,PER] conspiracy [ORG,PER] conspiracy country [PER,MISC] country [PER,MISC,ORG] conspiracy country ingredients [ORG,MISC] ineffective [PER,MISC,ORG] conspiracy country pharma [MISC] ingredients [PER,ORG] conspiracy country side-effect [MISC,LOC] mandatory [PER] conspiracy ineffective [PER] none [] conspiracy ineffective ingredients [ORG,LOC,MISC] pharma [ORG,PER] conspiracy ineffective side-effect [PER] political [MISC,PER] conspiracy ingredients [MISC,ORG] conspiracy ingredients mandatory [PER,MISC] religious [MISC] conspiracy ingredients pharma [PER,ORG] rushed [ORG,LOC] conspiracy ingredients religious [MISC,PER] side-effect [ORG,LOC,MISC] ... ... unnecessary [MISC] Table 1.3 Table 1.4 Label association with tweet length analysis Tweet sentiment analysis Label Tweet Length Label Sentiment side-effect 34.258344 side-effect -0.265099 ineffective 37.519737 ineffective -0.050121 rushed 38.111036 rushed -0.051898 pharma 36.656716 pharma -0.047567 mandatory 35.975734 mandatory -0.098390 unnecessary 38.738227 unnecessary -0.113949 none 25.817170 none -0.057131 political 38.599042 political -0.125577 conspiracy 37.882957 conspiracy -0.152121 ingredients 35.243119 ingredients -0.096524 country 31.129353 country -0.027262 religious 33.031250 religious -0.180167 Table 1.5 Original and Expanded tweet text analysis Original tweet Expanded tweet Label It is important to note that the claim made Michael Yeadon, a former employee of by Michael Yeadon, a former employee of Pfizer, said that the government rollout Pfizer, that the government rollout of the of the COVID-19 vaccine is an attempt at conspiracy COVID-19 vaccine is an attempt at ”mass ”mass depopulation” with booster recipi- depopulation” with booster recipients ex- ents expected to die ... pected to die... The statement seems to suggest that the @MrStache9 Well i believe there won’t be Canadian government, led by Prime Min- an election until Trudick get enough covid ister Justin Trudeau, will likely wait until political vaccine into enough people to claim he did a significant portion of the population has something right... been vaccinated against COVID-19... It seems that the person who wrote this text is expressing their skepticism or dis- I’d rather catch Covid than take that trust towards the Russian COVID-19 vac- country Russian vaccine! cine. They are saying that they would pre- fer to risk getting infected with COVID-19 than to take the Russian vaccine... B. Experiments GPT-4 Prompt System Prompt: You are a helpful assistant that will help in providing the most relevant labels to a social media post from a list of labels that express significant concern towards the vaccine. User Prompt: Assign most relevant labels to a social media post (particularly, a tweet) according to the specific concern(s) towards vaccines as expressed by the author of the post. Note that a tweet can have more than one label (concern), e.g., a tweet expressing more than 1 different concerns towards vaccines will have more labels. We consider the following concerns towards vaccines as the labels for this classification task: {labels with description} tweet text: {text} Response: list of labels separated by space Sample of Few-shot examples: { "role": "system", "name": "example_user", "content": '''@kentlivenews Let's hope Boris Johnson isn't one of those new trainees to stick people with the vaccine. Not a good picture to use.''' }, { "role": "system", "name": "example_assistant", "content": 'Political', } C. Online Resources • GitHub