Unveiling Diverse Vaccine Sentiments: Multi-Label
                                Text Classification
                                K Shanmukha Naveen1,0† , S Sharon Roshini2,0† and S Karthika3,0†
                                1
                                  Department of Information Technology, Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, Chennai
                                2
                                  Department of Information Technology, Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, Chennai
                                3
                                  Department of Information Technology, Sri Sivasubramaniya Nadar College of Engineering, Kalavakkam, Chennai


                                                                         Abstract
                                                                         This paper presents a comprehensive analysis of multi-label text classification for antivaccination tweets,
                                                                         utilizing a dataset of around 10,000 tweets across 12 predefined classes. The study’s primary goal was to
                                                                         categorize the various sentiments, concerns in tweets expressed in the realm of antivaccination on social
                                                                         media. To achieve this, three cutting-edge transformer-based models (BERT, XLNet, and RoBERTa) were
                                                                         employed and fine-tuned for tweet classification. The results of our experiments revealed that the BERT
                                                                         model achieved notably high accuracy of 0.88 with F1 macro score being 0.65 in its classification tasks. This
                                                                         research significantly contributes to the field of natural language processing, highlighting the effectiveness
                                                                         of transformer models XLNet ,RoBERTa, particularly BERT, in handling multi-label text classification
                                                                         for antivaccination tweets. XLNet and RoBERTa models yielded comparatively lower accuracies of
                                                                         0.87 and 0.83 ,respectively .The insights gained from this study offer valuable implementations of these
                                                                         transformer models for better understanding on public concerns related to vaccination efforts and public
                                                                         health initiatives.

                                                                         Keywords
                                                                         BERT, XLNET, NLTK, RoBERTa, NLP


                                1. Introduction
                                In recent years, the proliferation of social media platforms has given rise to a surge in the
                                dissemination of information and opinions related to various societal issues, including vac-
                                cination. The topic of vaccination, has become a subject of heated debate, with a growing
                                presence of antivaccination sentiments on these platforms. Understanding and categorizing
                                the sentiments, concerns, and viewpoints expressed in antivaccination tweets is essential for
                                gaining insights into public perceptions and potentially mitigating the adverse effects of vaccine
                                misinformation.This research analyses the power of advanced transformer models, including
                                BERT, XLNet, and RoBERTa, similar to that of [1] and [2]for the purpose of conducting multi-
                                label text classification . Our primary objective is to explore the impact of data preprocessing
                                on the accuracy of classifying these tweets. While transformer-based models have demon-
                                strated remarkable success in various natural language processing tasks, their performance
                                Forum for Information Retrieval Evaluation, December 15-18, 2023, India
                                Envelope-Open shanmukhanaveen2010809@ssn.edu.in (K. S. Naveen); sharonroshini2010942@ssn.edu.in (S. S. Roshini);
                                skarthika@ssn.edu.in (S. Karthika)
                                GLOBE https://github.com/shanboii/AISOME23 (K. S. Naveen)
                                Orcid 0000-0001-8919-5841 (S. Karthika)
                                                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR
                                    Workshop
                                    Proceedings
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073
                                                                       CEUR Workshop Proceedings (CEUR-WS.org)


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
can be affected by the quality and characteristics of the input data. Antivaccination tweets
are challenging to classify due to their unique language and varying emotions. This paper[3]
assisted in understanding the role of data preprocessing in enhancing the models’ predictive
capabilities .In addition to investigating data preprocessing, this research systematically explores
a range of hyperparameters, including batch size, decay rates, learning rates, and epochs, to
fine-tune the transformer models. By optimizing these hyperparameters, we aim to push the
boundaries of classification accuracy, enabling more precise categorization of tweets across
multiple predefined classes. By shedding light on the interplay between data preprocessing,
hyperparameter tuning, and transformer-based models, we contribute to advancing the state of
the art models in text classification.
   The key contributions of this paper are as follows: Firstly, data preprocessing was applied to
precisely 9,921 tweets within the provided dataset. Secondly, we trained three advanced models,
namely BERT, XLNet, and RoBERTa, on the preprocessed dataset, utilizing their transformative
capabilities for effective text classification. Lastly, an optimization process was conducted on
hyperparameters to boost classification accuracy, ensuring the models performed at their best.


2. Related work
This paper [4]focuses on creating a multi-labeled Arabic dataset from COVID-19 tweets, explor-
ing both traditional machine learning and deep-learning approaches to achieve higher accuracy
and stable performance in sentiment analysis and topic classification on Twitter. Another paper
[5] introduces LIAR which improves biomedical document classification using BioBERT and
an adaptive loss function, surpassing state-of-the-art methods by 1 percent.[6]conducted senti-
ment analysis on Twitter discussions about COVID-19 vaccines.[7]analyzes COVID-19-related
emotions in Twitter data and finds BERT to be outperforming others in sentiment analysis
and classification.This[8] investigated public perceptions of COVID-19 vaccine adverse effects
through social media data, with LSTM achieving the highest accuracy. This study[9] uses
machine learning and NLP, particularly the BERT model, to analyze sentiments in COVID-19
vaccination tweets, achieving the highest accuracy among classifiers. Here [10]Support Vector
Machine (SVM) performed best in analyzing Covid-19 vaccine-related tweets with accuracy of
84.32. [11]utilizes sentiment analysis and machine learning techniques to analyze public atti-
tudes towards COVID-19 vaccination. This paper[12] analyzes COVID-19 vaccine perception,
using Twitter data and deep learning models, with LSTM achieving the highest accuracy at 85.7
percent.This study[13] addresses automated topic annotation challenges in COVID-19 litera-
ture, presenting the BioCreative LitCovid dataset and achieving remarkable F1 scores of 0.8875
(macro), 0.9181 (micro), and 0.9394 (instance-based) with transformer-based hybrid systems.This
paper [14] and [15] introduces CAVES, a substantial dataset of COVID-19 anti-vaccine tweets
categorized by specific concerns and classification of vaccine hesitancy on social media.


3. Methodology
The methodology employed for the multi-label text classification of antivaccination tweets
involved a systematic approach encompassing data preprocessing, model implementation, and
evaluation. The study was conducted using two distinct strategies: one utilizing preprocessed
data, and the other utilizing unprocessed data. The following sections outline the key steps
undertaken in this research.

3.1. Data preprocessing
The initial phase of the research involved data preprocessing to ensure the dataset’s suitability for
subsequent analysis. The Natural Language Toolkit (NLTK) was utilized for text preprocessing
tasks such as tokenization, stopword removal, and stemming. The tweets in the dataset were
complex, containing symbols like ’@’, brackets , HTML tags and other emojis. These were
removed, and all tweets were changed to lowercase. Additionally, each tweet was categorized
by assigning binary labels (0 or 1) based on its relevance to predefined classes. These twelve
classes, namely mandatory, country, conspiracy, unnecessary, political, ingredients, side-effect,
pharma, none, ineffective, rushed, and religious, were used to classify the tweets in the context
of antivaccination discussions. This preprocessing step aimed to enhance the quality of the
dataset and establish a basis for subsequent model training.

3.2. Model Implementation
In this study, three prominent transformer-based models, namely BERT, XLNet, and RoBERTa,
were employed due to their exceptional ability to capture intricate linguistic relationships and
patterns in text data. Each model was explored using two distinct approaches:
   a. Preprocessed Data: The preprocessed dataset was utilized for model training. The tokenized
and labeled tweets, a result of our preprocessing steps, were fed into the transformer models
for fine-tuning. This approach enabled effective classification of tweets into specified classes
based on the learned features and patterns.
   b. Unprocessed Data: Additionally, the unprocessed dataset containing raw text tweets was
employed for model training. This approach aimed to assess the models’ capacity to handle
noisy and unstructured data directly from the source. Training on unprocessed data provided
insights into the models’ ability to effectively process and classify tweets without preprocessing.
   These two approaches were implemented to determine which method resulted in a higher
classification accuracy.

3.3. Evaluation
A comparative analysis was conducted to assess the performance of models trained on two
dataset variants: preprocessed and unprocessed. The primary objective was to understand how
data preprocessing influenced model performance and classification accuracy.Model evaluation
involved employing an 80-20 train-test split approach, where 80 percent of the transformed
data from both preprocessed and unprocessed datasets was designated for training, and the
remaining 20 percent was allocated for testing . Here, test dataset containing 486 records was
utilized for prediction by the models. Subsequently, trained models were applied to predict
tweet labels in the test sets. Evaluation metrics, including accuracy and F1 macro validation
score, were employed to gauge the models’ classification accuracy .After careful comparison of
accuracy and F1 macro validation score, the model showing the best performance in classifying
antivaccination tweets was chosen. This was chosen based on its higher accuracy and F1 macro
score reports. Morover, this choice is based on concrete data , ensuring accurate classifiation
of various sentiments and concerns expressed in antivaccination discussions on social media.
The chosen model highlights the progress in natural language processing and its vital role in
understanding public opinions about vaccination efforts, especially during the global pandemic.


4. Implementation
The three transformer models (BERT, XLNet, and RoBERTa) were trained on the antivaccination
tweet dataset, an iterative process was initiated to enhance the classification performance.
This iterative approach involved systematically experimenting with various combinations of
hyperparameters, including batch sizes, decay rates, learning rates, and epochs.

4.1. BERT
BERT model, utilizes a multi-layer bidirectional transformer encoder to represent text which
enables it to grasp the full context of each word in a sentence, significantly enhancing its
understanding of text meaning. One of the most interesting things about BERT is that it’s a
pre-trained model.In our dataset, BERT quickly understood tweets and gave us better accuracy.
BERT is good at figuring out what tweets mean, even though tweets are short and informal.
This pre-training equips BERT with a deep understanding of language structure and meaning.

4.2. XLNET
XLNet, an extension of BERT, employs a permutation-based training approach, enabling it
to understand bidirectional context . Unlike BERT, which reads text in fixed bidirectional
sequences, XLNet considers all possible permutations of the input words. To process the tweets
in our dataset for classification using XLNet, the tweets undergo initial cleaning to remove
irrelevant information. The cleaned text is then broken down into smaller units called tokens
through tokenization. These tokens are then numerically encoded using XLNet’s vocabulary,
assigning a unique ID to each token. These numerical representations are fed into the XLNet
model for classification.

4.3. RoBERTa
RoBERTa, an extension of the BERT architecture, stems from the BERT revolution and stands
for Robustly Optimized BERT Pretraining Approach. Recognizing that BERT was undertrained
despite its remarkable performance, the authors proposed crucial modifications. These included
more extensive training with larger batches and data, eliminating the next sentence prediction
objective, and incorporating dynamic masking during pretraining. Notably, RoBERTa employs
a distinct tokenizer, byte-level BPE, and a larger vocabulary (50k vs. 30k) compared to BERT.
Despite the resulting increase in model complexity due to a larger vocabulary, RoBERTa justifies
this enhancement with significant performance gains across various tasks.
   After training these three transformer models with both preprocessed and unpreprocessed
data, hyperparameters, including batch size, decay rates, learning rates, and epochs, were
fine-tuned to achieve higher accuracy and an optimal F1 macro score. The process aimed
at refining the models’ predictive capabilities for multi-label text classification, ensuring a
more precise categorization of the diverse sentiments, concerns, and viewpoints expressed
within antivaccination discussions on social media.In essence, this thorough optimization of
hyperparameters served as a critical step in maximizing the models’ performance and enhancing
their ability to accurately classify antivaccination tweets.


5. Observations and Results
From the implementations , our exploration of preprocessed and unprocessed data for multilabel
text classification of antivaccination tweets revealed that the models trained on the unprocessed
dataset exhibited higher accuracies compared to their preprocessed counterparts. Notably,
optimizing hyperparameters, specifically setting the learning rate to 0.001 or 0.0001, decay rate
to 0.01 or 0.001, using either 5 or 10 epochs, and a batch size of 32, significantly contributed to
higher accuracies and F1 score and Jaccard score.
   The evaluation metric used here to evaluate model’s accuracy is F1 macro score and Jaccard
score.
   Macro F1-Score is useful in multiclass classification problems where the classes are imbalanced.
It gives equal weight to each class, making it a good metric to use when the data is imbalanced ,
given by the formula:
                                                  precision × recall
                                𝐹 1 score = 2 × (                    )                          (1)
                                                  precision + recall
  The Jaccard Score is used for evaluating models dealing with binary or multiclass classification
focusing solely on the overlap between predicted and actual positive instances, given by the
formula :
                                                     |𝒰 ∩ 𝒱 |
                                   𝐽 𝑎𝑐𝑐𝑎𝑟𝑑(𝒰, 𝒱 ) =                                            (2)
                                                     |𝒰 ∪ 𝒱 |
   Amongst all , the learning rate, set at values like 0.001 or 0.0001, was a critical factor for
weight adjustments during training. This parameter played a significant role in achieving
convergence, where an apt learning rate ensured the model learned effectively from the data
without surpassing optimal parameter values.The ”epochs” parameter dictates the total number
of full passes through the training dataset, enabling models to iteratively learn and adjust
their parameters as they progress through the data. Thus , fine-tuning and optimising these
hyperparameters ensured the models robust performance yielding excelling accuracies and F1
scores.
   This experimental results are based on the following observations of hyperparameter optimi-
sations:
   As a result, the models trained on unprocessed data contributed to yield better results.
The preprocessed data did not result in higher accuracy. The unprocessed approach retains
authentic language and nuances found in tweets, aiding the model in better understanding
and classification. Table 2 specifies the results of models F1 scores and Jaccard scores. Here ,
Table 1
Results based on optimizing hyperparameters
               Model      Learning Rate   Batch Size      Decay Rate      Epochs       Accuracy
                BERT          2e-5             32               0.01         3           0.88
              RoBERTa         2e-5             32               0.01         3           0.87
               XLNET          2e-5             32               0.01         3           0.83


model-1 , model-2 and model-3 mentioned in the table refers to BERT , RoBERTa and XLNET
respectively.

Table 2
Run results
                        Run Model    MODEL          F1 Macro score     Jaccard score
                         model1        BERT              0.55              0.57
                         model2      RoBERTa             0.65              0.63
                         model3       XLNET              0.64              0.65


6. Conclusions
To sum up, the study aimed to accurately sort tweets about antivaccination. It was found that
using BERT, especially with unpreprocessed data, gave the most accurate results. Different
setups were tested to get the best outcomes. The main focus was to get the highest possible
score (F1 macro) for properly labeling tweets.The careful tweaking of the models resulted in
better scores that confirmed the models’ effectiveness in understanding and sorting tweets
on the topic of antivaccination, even when the conversation is tricky or complicated. In
conclusion, the research emphasizes the value of careful adjustments to the models, leading to
strong models that can grasp the complexities of discussions about antivaccination on social
media. The excellent performance of the BERT model with unpreprocessed data highlights
its potential in understanding and categorizing complex written content, especially in public
health conversations.


References
 [1] L. Cai, Y. Song, T. Liu, K. Zhang, A hybrid bert model that incorporates label semantics via
     adjustive attention for multi-label text classification, IEEE Access 8 (2020) 152183–152192.
     doi:1 0 . 1 1 0 9 / A C C E S S . 2 0 2 0 . 3 0 1 7 3 8 2 .
 [2] W.-C. Chang, H.-F. Yu, K. Zhong, Y. Yang, I. S. Dhillon, Taming pretrained transformers
     for extreme multi-label text classification, in: Proceedings of the 26th ACM SIGKDD
     international conference on knowledge discovery & data mining, 2020, pp. 3163–3171.
 [3] S. Vijayarani, M. J. Ilamathi, M. Nithya, et al., Preprocessing techniques for text mining-an
     overview, International Journal of Computer Science & Communication Networks 5 (2015)
     7–16.
 [4] F. M. Alderazi, A. A. Algosaibi, M. A. Alabdullatif, Multi-labeled dataset of arabic covid-19
     tweets for topic-based sentiment classifications, in: 2022 IEEE International Conference on
     Evolving and Adaptive Intelligent Systems (EAIS), 2022, pp. 1–8. doi:1 0 . 1 1 0 9 / E A I S 5 1 9 2 7 .
     2022.9787700.
 [5] Z. Chen, J. Peng, Learning label independence and relevance for multi-label biomedical text
     classification, in: 2022 IEEE International Conference on Systems, Man, and Cybernetics
     (SMC), 2022, pp. 2776–2781. doi:1 0 . 1 1 0 9 / S M C 5 3 6 5 4 . 2 0 2 2 . 9 9 4 5 4 0 4 .
 [6] Z. Xu, L. Shi, Y. Wang, J. Zhang, L. Huang, C. Zhang, S. Liu, P. Zhao, H. Liu, L. Zhu,
     Y. Tai, C. Bai, T. Gao, J. Song, P. Xia, J. Dong, J. Zhao, F.-S. Wang, Pathological findings
     of covid-19 associated with acute respiratory distress syndrome, The Lancet Respira-
     tory Medicine 8 (2020) 420–422. URL: https://www.sciencedirect.com/science/article/pii/
     S221326002030076X. doi:h t t p s : / / d o i . o r g / 1 0 . 1 0 1 6 / S 2 2 1 3 - 2 6 0 0 ( 2 0 ) 3 0 0 7 6 - X .
 [7] V. Battula, S. G. Goli, J. Nasigari, Identification of optimal model for multi-class classification
     of covid tweets, in: 2022 9th International Conference on Computing for Sustainable Global
     Development (INDIACom), 2022, pp. 495–499. doi:1 0 . 2 3 9 1 9 / I N D I A C o m 5 4 5 9 7 . 2 0 2 2 . 9 7 6 3 2 9 1 .
 [8] K. Kariyapperuma, K. Banujan, P. Wijeratna, B. Kumara, Classification of covid19 vaccine-
     related tweets using deep learning, in: 2022 International Conference on Data Analytics for
     Business and Industry (ICDABI), 2022, pp. 1–5. doi:1 0 . 1 1 0 9 / I C D A B I 5 6 8 1 8 . 2 0 2 2 . 1 0 0 4 1 6 1 5 .
 [9] S. Ningombam, A. Roy, P. Debnath, An Empirical Analysis of Different Classifiers on
     COVID-19 Vaccination Data, Springer Nature Singapore, Singapore, 2023, pp. 285–295.
     URL: https://doi.org/10.1007/978-981-19-9304-6_28. doi:1 0 . 1 0 0 7 / 9 7 8 - 9 8 1 - 1 9 - 9 3 0 4 - 6 _ 2 8 .
[10] S. K. Akpatsa, X. Li, H. Lei, V.-H. K. S. Obeng, Evaluating public sentiment of covid-19
     vaccine tweets using machine learning techniques, Informatica 46 (2022).
[11] N. Gao, Text Analysis of Twitter Data for COVID-19 Vaccines, Ph.D. thesis, Instytut
     Informatyki, 2023.
[12] K. T. Shahriar, M. N. Islam, M. M. Anwar, I. H. Sarker, Covid-19 analytics: Towards the
     effect of vaccine brands through analyzing public sentiment of tweets, Informatics in
     medicine unlocked 31 (2022) 100969.
[13] Q. Chen, A. Allot, R. Leaman, R. Islamaj, J. Du, L. Fang, K. Wang, S. Xu, Y. Zhang,
     P. Bagherzadeh, S. Bergler, A. Bhatnagar, N. Bhavsar, Y.-C. Chang, S.-J. Lin, W. Tang,
     H. Zhang, I. Tavchioski, S. Pollak, S. Tian, J. Zhang, Y. Otmakhova, A. J. Yepes,
     H. Dong, H. Wu, R. Dufour, Y. Labrak, N. Chatterjee, K. Tandon, F. A. A. Lal-
     eye, L. Rakotoson, E. Chersoni, J. Gu, A. Friedrich, S. C. Pujari, M. Chizhikova,
     N. Sivadasan, S. VG, Z. Lu, Multi-label classification for biomedical literature: an
     overview of the BioCreative VII LitCovid Track for COVID-19 literature topic anno-
     tations, Database 2022 (2022) baac069. URL: https://doi.org/10.1093/database/baac069.
     doi:1 0 . 1 0 9 3 / d a t a b a s e / b a a c 0 6 9 . arXiv:https://academic.oup.com/database/article-
     pdf/doi/10.1093/database/baac069/45629681/baac069.pdf.
[14] S. Poddar, A. M. Samad, R. Mukherjee, N. Ganguly, S. Ghosh, Caves: A dataset to facilitate
     explainable classification and summarization of concerns towards covid vaccines, in: Pro-
     ceedings of the 45th International ACM SIGIR Conference on Research and Development
     in Information Retrieval, 2022, pp. 3154–3164.
[15] S. Poddar, M. Basu, K. Ghosh, S. Ghosh, Overview of the fire 2023 track:artificial intelligence
     on social media (aisome), in: Proceedings of the 15th Annual Meeting of the Forum for
     Information Retrieval Evaluation, 2023.