Overview of the FIRE 2023 Track: Artificial Intelligence on Social Media (AISoMe) Soham Poddar1 , Moumita Basu2 , Kripabandhu Ghosh3 and Saptarshi Ghosh1 1 Indian Institute of Technology, Kharagpur. 2 Amity University, Kolkata. 3 Indian Institute of Science Education and Research, Kolkata. Abstract The COVID-19 pandemic showed the importance of vaccination at a large scale. However, quite often people expressed different concerns they had towards vaccines which made them hesitant to take them. Some people were concerned about the potential side-effects of vaccines, while some believed that the vaccines were not necessary due to the disease being mild. These concerns were frequently shared on social media sites such as Twitter. The FIRE 2023 AISoMe track focused on identifying these specific concern(s) that people have towards vaccines from tweets, as a 12-class multi-label classification task. Keywords Twitter, microblogs, COVID-19, vaccine concerns, tweet, multi-label classification 1. Introduction Social media sites are rich sources of real-time information about people’s opinions on various topics. The Artificial Intelligence on Social Media (AISoMe) track aims to provide datasets and shared tasks for development of AI techniques (particularly, Machine Learning and NLP techniques) for utilizing social media data for diverse practical applications. The AISoMe 2023 track focused on a social media classification problem in the healthcare domain, which is as follows. During pandemics such as COVID-19 where complete vaccination is the primary long-term solution to fight against the disease, social media can be utilized to understand public sentiments towards vaccines [1, 2]. In particular, many people are skepti- cal/hesitant about the use of vaccines owing to various reasons, including the politics involved, the potential side-effects of vaccines, and the fact that vaccines have been rushed into production. We identified 11 such specific reasons (concerns about vaccines) in our prior work [3], which are listed in Table 1 along with their descriptions. Examples of tweets from each of these classes have been given in Table 2. It is important to understand the specific concerns people have towards vaccines, so that their concerns can be addressed. The AISoMe 2023 track focused on this task of labeling (classifying) a tweet with one or more of these concerns against vaccines. This is important since a person unwilling to take vaccines due to the side-effects of vaccines FIRE’23: Forum for Information Retrieval Evaluation, December 15-18, 2023, India $ sohampoddar26@gmail.com (S. Poddar); moumitabasu0979@gmail.com (M. Basu); kripa.ghosh@gmail.com (K. Ghosh); saptarshi.ghosh@gmail.com (S. Ghosh) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Conspiracy Deeper Conspiracy – The tweet suggests some deeper conspiracy, and not just that the Big Pharma want to make money (e.g., vaccines are being used to track people, COVID is a hoax). Country Country of origin – The tweet is against some vaccine because of the country where it was developed/manufactured. Ineffective Vaccine is ineffective – The tweet expresses concerns that the vaccines are not effective enough and are useless. Ingredients Vaccine Ingredients/technology – The tweet expresses concerns about the ingredi- ents present in the vaccines (eg. fetal cells, chemicals) or the technology used (e.g., mRNA vaccines can change your DNA). Mandatory Against mandatory vaccination – The tweet suggests that vaccines should not be made mandatory. Pharma Against Big Pharma – The tweet indicates that the Big Pharmaceutical companies are just trying to earn money, or is against such companies in general because of their history. Political Political side of vaccines – The tweet expresses concerns that the govern- ments/politicians are pushing their own agenda though the vaccines. Religious The tweet opposes vaccines due to religious reasons. Rushed Untested/Rushed Process – The tweet expresses concerns that the vaccines have not been tested properly or that the published data is not accurate. Side-effect Side Effects/Deaths – The tweet expresses concerns about the side effects of the vaccines, including deaths. Unnecessary The tweet indicates vaccines are unnecessary, or that alternate cures are better. None No specific reason stated in the tweet, or some reason other than the given ones. Table 1 The different classes/labels (concerns or objections towards vaccines) in the CAVES dataset [3] along with their descriptions. needs different persuasion and reasoning than someone who is hesitant to take vaccines due to the corruption in politics. 2. The Datasets and Evaluation Metrics This section describes the train and test datasets used for the track, and also describes the metrics used for evaluating the submitted runs/methods over the test dataset. 2.1. The training / validation dataset For training and validation, we utilize the ‘CAVES’ dataset from our prior work [3]. This dataset contains 9,921 anti-vaccine tweets about COVID vaccines (that were posted during 2020-21), where each tweet has been labelled with one or more of the 12 classes (given in Table 1) by human annotators. Table 2 shows some examples of tweets from this dataset, along with their labels. More details about the data collection and annotation process of the CAVES dataset can be found in the prior work [3]. Tweet Excerpt Labels STOP TAKING TOXIC VAX and expose COVID hoax and murders with mor- ingredients, phine and ventillators. there is No covid! conspiracy, unnecessary Please don’t push vaccine on us make it voluntary. We don’t trust anything to pharma, do with Bill Gates pushing their agenda of vaccine chips!! mandatory, ingredients The reason insurance companies won’t pay out if you experience the inevitable side-effect, adverse reactions, including death is because it is an “Experimental Vaccine” rushed Would you want the Russian vaccine? If not, you shouldn’t want one that’s political, been pushed through for political reasons either. country Catholic leaders are advising Catholics that the COVID-19 vaccine from Johnson religious & Johnson is "morally compromised" I’m NOT taking your damn vaccine. Keep your conspiracy out of my veins! none Table 2 Examples of tweets with their labels and explanations, from the CAVES dataset. The explanations for different labels are highlighted in italics. 2.2. The evaluation dataset For evaluation, we introduce a new dataset, developed in a similar fashion as the CAVES dataset. This dataset contains 486 tweets labelled into the same 12 classes. However, these tweets are not only about COVID vaccines but also about other types of vaccines (e.g., MMR vaccine, Flu vaccine), from both the COVID-era as well as pre-COVID times. 2.3. Evaluation method The participating teams were asked to develop models for the multi-label classification task, which were trained on the CAVES dataset and whose performance will be measured over the evaluation dataset described above. Each participating team were able to submit up to 3 runs, e.g, from models with different hyperparameters. They were also free to use other attributes of the tweets (apart from the text) if they wanted, along with other publicly available datasets for training their models. The submitted runs by the participants were ranked based on their performances on the evaluation dataset. The standard classification metric of Macro-F1 score on the 12 different classes was used for evaluation. 3. Methods - Submitted runs In the AISoMe track, 22 teams participated this year, and as many as 48 runs were submitted. Most of the teams used NLP pre-processing techniques and a few teams used TF-IDF Vectorizer to extract features. Among the classification techniques, fine-tuned transformer models such as BERT, RoBERTa and Covid-Twitter-BERT (CT-BERT) [4] are utilized mostly by the participating teams. Some teams also employed LLM-based models such as GPT 3.5 and GPT2LMheadmodel. Neural network-based classifiers (MLP) and traditional classifiers (such as Multinomial Naïve Team Id Overview of method Macro-F1 AKCSIT Fine tuned CT-BERT 0.71 DatawIz Fine tuned CT-BERT 0.71 IISERBPR-NLP Fine tuned BERT with best threshold 0.70 DSIRC Fine tuned CT-BERT 0.67 Cognitive Coders DeBERTa Large Fine-tuned 0.67 TextTitans BERT-large uncased 0.66 PICT CL LAB Group 1 RoBERTa based model 0.65 SSN_IT_Team01 RoBERTa based model 0.65 LLM-geeks Intersection of the predictions from DeBERTa and 0.63 RoBERTa SSN_IT_Team02 RoBERTa based model 0.57 Data Warriors LLM based model (GPT 3.5) 0.55 Alpha Intellect AI BERT based uncased 0.54 S3 Endeavour GPT2LMheadmodel 0.46 PICT CL Lab Support Vector Machine (SVM) model 0.45 ZSL Decision Tree Classifier + Multi Output Classifier 0.43 C3 RoBERTa based sentence classification 0.41 APS AI&ML Multinomial Naive Bayes, Multi-Output Classifier 0.39 Social Media Data Analy- Classifier chain with Support Vector Machines 0.38 sis Team OpenVax Multi-Layer Perceptron (MLP) model 0.37 RANJAN A-MONKA- CNN-BiLSTM model with GLOVE embeddings 0.29 RESEARCH Swastik Anupam TFIDF-Neural Net 0.25 IIIT_SURAT SVM models within the Classifier Chain 0.07 Table 3 Comparison among some of the submitted runs in the classification task. Runs are ranked in decreasing order of Macro F1-score. We are reporting only the best-performing run of each team Bayes and Support Vector Machines, Multi-Output Classifiers) are also used by some of the teams. The summary of the techniques is reported in Table 3. It is observed than fine-tuned CT-BERT models have outperformed all traditional and other neural classifiers for our task. 4. Conclusion and Future Directions The FIRE 2023 AISoMe track compared the performance of various methods for identifying the specific anti-vaccine concerns from tweets. We hope that the test collections developed in this track will be utilized by the research community in the development of better models for this important task in future. It can be noted that the CAVES dataset also contains explanations for the class labels, as well as summaries for the different anti-vaccine classes (details in [3]). These data can also be utilized for tasks such as explainable tweet classification and tweet summarization in future. Acknowledgments The track organizers thank all the participants for their interest in this track, and the FIRE authorities for their support in running the track. References [1] S. Poddar, M. Mondal, J. Misra, N. Ganguly, S. Ghosh, Winds of change: Impact of covid-19 on vaccine-related opinions of twitter users, in: Proceedings of the International AAAI Conference on Web and Social Media, volume 16, AAAI Press, 2022, pp. 782–793. [2] L.-A. Cotfas, C. Delcea, I. Roxin, C. Ioanăş, D. S. Gherai, F. Tajariol, The longest month: analyzing covid-19 vaccination opinions dynamics from tweets in the month following the first vaccine announcement, Ieee Access 9 (2021) 33203–33223. [3] S. Poddar, A. M. Samad, R. Mukherjee, N. Ganguly, S. Ghosh, CAVES: A dataset to facilitate Explainable Classification and Summarization of Concerns towards COVID Vaccines, in: Proceedings of the International ACM SIGIR Conference, 2022. [4] M. Müller, M. Salathé, P. E. Kummervold, Covid-twitter-bert: A natural language processing model to analyse covid-19 content on twitter, arXiv preprint arXiv:2005.07503 (2020).