<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Multi-label Classification of Vaccine Sentiments in Social Media</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Somsubhra De</string-name>
          <email>somsubhra@outlook.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shaurya Vats</string-name>
          <email>shauryavats56@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Forum for Information Retrieval Evaluation</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Indian Institute of Technology Kharagpur</institution>
          ,
          <addr-line>West Bengal</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Indian Institute of Technology Madras</institution>
          ,
          <addr-line>Tamil Nadu</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2064</year>
      </pub-date>
      <fpage>0000</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>In the realm of public health, vaccination stands as the cornerstone for mitigating disease risks and controlling their proliferation. The recent COVID-19 pandemic has highlighted how vaccines play a crucial role in keeping us safe. However the situation involves a mix of perspectives, with skepticism towards vaccines prevailing for various reasons such as political dynamics, apprehensions about side efects, and more. The paper addresses the challenge of comprehensively understanding and categorizing these diverse concerns expressed in the context of vaccination. Our focus is on developing a robust multi-label classifier capable of assigning specific concern labels to tweets based on the articulated apprehensions towards vaccines. To achieve this, we delve into the application of a diverse set of advanced natural language processing techniques and machine learning algorithms including transformer models like BERT, state of the art GPT 3.5, Classifier Chains &amp; traditional methods like SVM, Random Forest, Naive Bayes. We see that the cutting-edge large language model outperforms all other methods in this context.</p>
      </abstract>
      <kwd-group>
        <kwd>classification</kwd>
        <kwd>LLM</kwd>
        <kwd>Prompt engineering</kwd>
        <kwd>Transformer models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>In the wake of the global COVID-19 pandemic, misinformation and vaccine hesitancy have
emerged as significant societal challenges. The spread of anti-vaccine sentiment on social media
platforms has amplified concerns. We address the complex task of multi-label classification
of anti-vax tweets, aiming to identify and categorize the diverse spectrum of vaccine-related
misinformation in the era of COVID-19. Through experimental evaluation, we demonstrate
the eficacy of our proposed approach - our results not only underscore the prevalence of
vaccine related concerns in social media discussions but also shed light on the nuanced nature
of these concerns. The work provides valuable insights for policymakers, health organizations
to communicate and intervene where required and emphasizes the significance of utilizing
realtime social media discussions to inform evidence-based strategies that address public concerns
https://linkedin.com/in/somsubhrad (S. De); https://linkedin.com/in/shaurya-vats-6a521a231 (S. Vats)</p>
      <p>© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
Workshop
Proceedings
and promote informed decision-making.</p>
      <p>All the materials, including code, datasets, and related resources for our work, are accessible in
this repository.</p>
    </sec>
    <sec id="sec-3">
      <title>2. Understanding the Task</title>
      <p>Given a tweet content, it can be classified into categories from the list of the following 12 total
classes (each of the classes refer to the diferent concerns or reasons against the use of vaccines)
that we have:
• Unnecessary - The tweet indicates that vaccines are unnecessary or alternate cures are
better
• Mandatory - The tweet is against mandatory vaccination
• Pharma - The tweet indicates that the Big Pharmaceutical companies are just trying to
earn money, or it is against such companies in general because of their history
• Conspiracy - The tweet suggests some deeper conspiracy, and not just that the Big
Pharma want to make money (e.g. vaccines are being used to track people, COVID is a
hoax)
• Political - The tweet expresses concerns that the governments or politicians are pushing
their own agenda though the vaccines
• Country - The tweet is against some vaccine because of the country where it was
developed or manufactured
• Rushed - The tweet expresses concerns that the vaccines have not been tested properly
or the published data is not accurate
• Ingredients - The tweet expresses concerns about the ingredients present in the vaccines
(e.g. fetal cells, chemicals) or the technology used (e.g. mRNA vaccines can change the
DNA)
• Side-efect - The tweet expresses concerns about the side efects of the vaccines, including
deaths caused
• Inefective - The tweet expresses concerns that the vaccines are not efective enough
and are useless
• Religious - The tweet is against vaccines because of religious reasons
• None - No specific reason stated in the tweet or some reason other than the given ones
As this task involves multi-label classification, each tweet can be mapped to more than one
labels, contingent on the stances expressed within the text. Below are a few tweet examples,
each accompanied by associated labels and detailed descriptions to provide better context.</p>
    </sec>
    <sec id="sec-4">
      <title>3. Dataset</title>
      <p>
        The training dataset, comprising 9,921 anti-vaccine tweet texts (posted during 2020-21) with
corresponding tweet IDs and annotated labels, was sourced from the CAVES[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] Dataset. The
test set consists of 486 tweets (along with their IDs), encompassing various vaccine types such
as MMR, flu vaccines and more, in addition to COVID-19 vaccines.
      </p>
      <sec id="sec-4-1">
        <title>Labels</title>
        <p>side-effect
mandatory
political
My Take On The new vaccines? 1. I don’t trust side-effect
pharmaceutical companies 2. Only 61% of pub- pharma
lic will get vaccinated 3. Problems with peo- rushed
ple with allergies 4. They’re not free 5. No
data on longevity 6. Side efects? 7.
Children under 16 were not tested in the Pfizer trial
https://t.co/vZ4ZPkroc4
Why would we get a vaccine with so many side
efects ? Ccp covid has a 99.7% survival rate, so
why get a vaccine? Thatâ€™s stupid and
dangerous.
side-effect
unnecessary
@TorontoStar Then I suppose there is no hope for ineffective
a vaccine. Nothing coming and it won’t work if it
does come. It is Lockdowns Forever, or take your
chances with Covid. Lockdowns are no life at all.</p>
        <p>My choice it to take my chances. Bring it on!
Oh my! One doesn’t have to be an expert at read- conspiracy
ing body language to know he’s covering
something up. Depopulation is his game.</p>
      </sec>
      <sec id="sec-4-2">
        <title>Tweet Description</title>
        <p>The tweet reports FDA’s
announcement of two deaths
during Pfizer vaccine trials
due to ‘serious adverse events’
and mentions that members
of the Democratic party in
New York are expressing their
strong opposition to the idea
of mandatory or compulsory
vaccinations.</p>
        <p>The tweet expresses
skepticism about new vaccines,
citing distrust of pharmaceutical
companies, concerns about
side efects of rushed vaccines,
and other issues, particularly
focusing on the lack of
comprehensive testing.</p>
        <p>The tweet questions the need
for vaccines due to
perceived side efects and argues
against it, citing a high
survival rate.</p>
        <p>The tweet expresses
skepticism about the efectiveness
of vaccines, suggesting that
there’s no hope for them
and indicating a preference
for taking their chances with
COVID-19 over enduring
continuous lockdowns.</p>
        <p>The tweet implies a
conspiracy claiming that someone is
involved in an agenda,
planning to reduce the global
population intentionally.</p>
        <sec id="sec-4-2-1">
          <title>3.1. Some insights about Train set</title>
          <p>Performing an EDA, we observe that the dataset exhibits class imbalance with certain labels
having significantly higher counts compared to others. The two primary concerns side-effect
and ineffective appear to dominate the dataset where as vaccine skepticism due to religious
reasons and concerns related to the country of origin are less prevalent.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Our Methodology</title>
      <sec id="sec-5-1">
        <title>4.1. Data Pre-processing</title>
        <p>To enhance the model’s performance, we conducted comprehensive data pre-processing on
both the training and test datasets. This involved meticulous cleaning to eliminate noisy data,
and various other steps ensuring that the model’s integrity remained intact.
• Stop word Removal: Elimination of commonly occurring English stop words (eg, ‘a’, ‘an’,
‘the’, etc. which do not add much meaning to the text) as per the NLTK library
• Lower casing Text: Conversion of the tweet text to lowercase, ensuring consistent, uniform
handling of words
• Removal of punctuations and twitter handles or usernames (‘@’)
• Removal of non-alphanumeric characters excluding hashtags
• Emoji Translation: Converting emojis to their corresponding textual representations
• Contraction Expansion: Expansion of contractions to their full word forms, enhancing
text clarity and comprehensibility.</p>
        <p>• Removal of URLs as they don’t contribute to sentiment analysis</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Tokenization &amp; Lemmatization</title>
        <p>Tokenization refers to the process of breaking down of a sequence of text into smaller units
or tokens. It is crucial for feature extraction, dimensionality reduction, normalization and
semantic understanding. The tweets have been tokenized using the nltk.tokenize package.
Lemmatization is a text normalization technique, it is the process of reducing words to their
meaningful base form (lemma) to ensure variants of a word are treated as a single item for
analysis or retrieval purposes. We employed the WordNetLemmatizer from the NLTK library for
this task. We chose lemmatization over an alternative technique called stemming for a specific
reason. Stemming involves chopping of the ends of words to reduce them to a common root.
While this can be useful in some cases, it’s not ideal when preserving the full meaning of words
is important. Stemming can sometimes truncate words too aggressively, resulting in a loss of
meaning. In our context, where understanding the context of text is crucial, lemmatization helps
us retain more of the original word’s meaning while still achieving normalization. Additionally,
we integrated Part-of-Speech (PoS) tagging into our text processing pipeline which involves
assigning a grammatical category (nouns, verbs, adjectives, etc.) to each word in the text. This
further enhances our ability to understand the context and helps us make more precise decisions
during lemmatization.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Models</title>
      <p>
        5.1. LLM
We have employed Large Language Models (LLMs) to address a particularly intricate and
under-explored realm - multi-label classification, within the context of Social Media Tweets on
vaccines. Our exploration involved the utilization of diverse prompting methods, such as chain
of thought (CoT) style[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], zero-shot &amp; few-shot learning. Through rigorous experimentation,
we discovered that an amalgamation of these prompting styles proved most efective for our
objectives. Subsequently, we formulated a novel prompt template tailored specifically for
multilabel text classification tasks with LLMs. We noted an enhancement in model performance
when prompts requiring the model to generate explanations or reasonings were incorporated.
The rationale behind the model’s assignment of labels to a tweet exhibited a more coherent and
meaningful pattern. A selection of instances of the model’s reasoning is provided in Table 2 for
reference.
am apreciat un videoclip pe treatment and cure
for gardasil hpv vaccine injury here leylas story
[‘side-efect’]
oral polio vaccine infecting unvaccinated kid via
#vet #christian #writer #scifi #trump #maga
[‘ingredients’,
‘religious’]
      </p>
      <sec id="sec-6-1">
        <title>Explanation Generated</title>
        <p>The tweet expresses concern
about the potential side
effects of the HPV vaccine,
specifically linking it to a
historically high infertility rate.
The tweet also mentions the
keywords ‘spirituality’ and
‘self’, which suggests that the
tweet is expressing opposition
to the vaccine based on
religious beliefs.</p>
        <p>The tweet mentions “HPV
vaccine injury” and “Leylaś
story” which suggests that the
tweet is expressing worries
about the adverse efects of
the HPV vaccine. The
keywords “injury” and “story”
are associated with the
“Sideefect” label, which is why
this label is applicable to the
tweet.’
The tweet mentions the words
‘oral polio vaccine’ and
‘unvaccinated kid’, which
suggests that the tweet is
raising concerns about the
ingredients present in the
vaccine and its potential
effects on unvaccinated
children. The tweet also includes
the words ‘Christian’ and
‘Trump’, which implies that
the tweet is expressing
opposition to the vaccine based
on religious beliefs. The
keywords ‘oral polio vaccine’,
‘unvaccinated kid’</p>
      </sec>
      <sec id="sec-6-2">
        <title>5.1.1. Template</title>
        <p>
          We used the text-davinci-003 engine which belongs to the GPT 3.5[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] family, with
configuration temperature = 0 and max_tokens = 100. Our prompt template is outlined as follows:
1 prompt = f"""
2 Task: You are given a tweet on vaccines delimited by triple backticks(```). Your task is to
pick the right Concern Labels for the tweet from the provided options. Also, explain the
logical reasoning you took to choose a label, including the relevant words from the tweets.
Learn from examples by understanding common words, keywords, and feelings associated with each
label. Only choose labels you're confident about. Each label has a description and common
keywords to help you understand the concern that the tweet is depicting.
###
Format of response (Response should include Concern in the same format as in examples):
Concern: [List of all the relevant applicable concern labels]
Reasoning: [logical reasoning followed to decide each of the applicable labels]
Tweet: ```{Tweet}```
Note: Include only the most relevant concern labels in your response. Understand and analyze
the sentiment and hidden meanings associated with the given tweet and compare it with the
sentiments and keywords in the examples before responding. Comprehend the descriptions and
keywords associated with each concern label and then assess the similarity with the given
tweet's meaning and these concerns. Verify each and every label before responding to increase
the prediction accuracy.
        </p>
        <p>"""</p>
      </sec>
      <sec id="sec-6-3">
        <title>5.1.2. Proposed Prompt Template Approach</title>
        <p>In the context of addressing a 12-class multi-label classification problem, we propose a structured
approach that includes the following essential components:
Task Specification: It is imperative to commence by precisely articulating the classification
task under consideration. In our case, this involves the categorization of Social Media Tweets
concerning vaccines into multiple concern categories.</p>
        <p>Option Presentation: Subsequently, we put forth a comprehensive array of the 12 distinct
concern labels for the model’s consideration.</p>
        <p>
          Explanatory Context for Options: To facilitate the model’s comprehension, we provide
detailed explanations and descriptions for each of the concern labels. These descriptions not
only convey the essence of each concern but also elucidate common keywords (as per table 6[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]
along with some more additions to it), sentiments, or themes that are typically associated with
them.
        </p>
        <p>Learning Through Examples: An integral aspect of our approach involves training the model
through illustrative examples. These examples consist of actual tweets paired with their
corresponding concern labels. Additionally, we furnish comprehensive explanations accompanying
these examples, elucidating the logical reasoning behind the assignment of specific labels to
each tweet.</p>
        <p>In the pursuit of optimizing the model performance, we maintain records of the specific
adjustments and enhancements deemed necessary after testing. These modifications include various
aspects, such as refining keyword lists, fine-tuning model parameters, and adapting the training
strategy.</p>
        <p>It is important to emphasize that our methodology transcends mere label assignment, we
challenge the model to not only provide labels but also substantiate its choices through
logical reasoning or explanations. This approach fosters a deeper understanding of the model’s
decision-making process, enabling us to elucidate the thought patterns underlying its multi-label
classification predictions.</p>
        <sec id="sec-6-3-1">
          <title>5.2. Transformer based models</title>
        </sec>
      </sec>
      <sec id="sec-6-4">
        <title>5.2.1. DistilBERT &amp; BERT</title>
        <p>
          Distillable Bidirectional Encoder Representations from Transformers is a widely used NLP
model. DistilBERT base uncased[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] (which has been used) is a lighter and faster variant of the
BERT model that retains much of its capability to understand and generate human-like text.
One-hot encoding was applied on the pre-processed train set. For the first run, the 9921 tweets
from the original training dataset were well-shufled and split into training and validation sets in
80:20 ratio. The 7936 tweets and their ground truth labels were used to fine-tune the pre-trained
model and the 1985 tweets (validation data) were used for evaluation purpose. Then, this model
was run to predict the classes for the test set, with the hyper-parameters: max len = 512, train
batch size = 16, learning rate = 1e-05, num workers = 2 and no. of epochs = 10.
For the second run, the model was fine-tuned with the entire train set data (without any
split). Dropout rate of 0.5, weight decay of 0.001, learning rate of 1e-4, batch size 16
and threshold value of 0.5 were applied. Model was trained for 10 epochs. These hyper
parameters values were carefully chosen to collectively contribute to improving the model
training eficiency &amp; performance (by mitigating overfitting etc.).
        </p>
        <p>
          BERT base uncased[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] was also tried (on 80% training &amp; 20% validation data) with the
hyperparameters: max len = 200, train batch size = 8, learning rate = 1e-05 and no. of epochs
= 10. BERT base uncased outperformed DistilBERT on the validation set, achieving a macro
F1 score of 0.83 after 10 epochs of training. However, when applied to the test set, BERT
faced challenges in predicting classes for a significant number of cases, unlike DistilBERT.
The superior performance on the validation set can be attributed to BERT’s larger capacity to
learn from the training data. Yet, this advantage might’ve also made BERT more susceptible to
overfitting, where it closely tailors its predictions to the training data, making it less adaptable
to the new test data. It’s possible that the hyper-parameters chosen during training favored
BERT’s performance on the validation set but were less suitable for the test set.
        </p>
        <sec id="sec-6-4-1">
          <title>5.3. Traditional Methods</title>
          <p>We also explored some of the traditional machine learning techniques like Multinomial Naive
Bayes (NB), Random Forest and Support Vector Machine (SVM) with TF-IDF vectorization on
this task. However, it became evident during our experiments that these traditional methods
struggled to efectively capture the complexity of the problem. The conclusion was drawn based
on the validation scores, which consistently demonstrated limitations in their performance.</p>
        </sec>
      </sec>
      <sec id="sec-6-5">
        <title>5.3.1. Feature extraction</title>
        <p>TF-IDF: Term Frequency-Inverse Document Frequency is a widely used text vectorization
technique in NLP and information retrieval. It helps in capturing the importance of words in a
document (d) and across a corpus. TF measures the frequency of a term (t) within a document
while IDF measures the importance of a term across a collection of documents.
  (, ) =
  (, ) =</p>
        <p>Number of times term  appears in document</p>
        <p>Total number of terms in document 
ln (</p>
        <p>Total number of documents in corpus 
Number of documents containing term  )
(1)
TF-IDF is the product of TF &amp; IDF which assigns a weight to each term in a document. Rare
words that appear in specific documents will have a high TF-IDF score.</p>
        <sec id="sec-6-5-1">
          <title>5.4. Classifier Chains</title>
          <p>
            Classifier chains [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ][
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] is a machine learning method for problem transformation in multi-label
classification. It is an extension of Binary Relevance, where each label is treated as a separate
binary classification problem.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. Results</title>
      <sec id="sec-7-1">
        <title>6.1. Metrics</title>
        <p>Precision (P): It is the measure of the accuracy of positive predictions made by the model. It
can be expressed as
 =</p>
        <p>(  +   )
where TP and FP refer to True positives (number of correctly predicted positive instances) and
False positives (number of incorrectly predicted positive instances) respectively.
Recall(R): It answers what proportion of actual positives was identified correctly. It can be
expressed as
 =</p>
        <p>(  +   )
where FN refers to False negatives (number of incorrectly predicted negative instances).
F-score(F1): It is the harmonic mean of precision and recall. It is a useful metric as it provides
a more balanced summarization of model performance, considering both the P and R values. It
can be expressed as
( × )
 1 = 2 ×
( + )
In terms of TP, FP and FN, the alternative equation can be
 1 =
(  +</p>
        <p>1 × (  +   )
2
These values are scaled between 0 and 1, with 1 signifying the highest achievable score and 0
denoting the poorest performance.</p>
        <p>Jaccard Score: Also known as the Jaccard Index or Jaccard Similarity Coeficient, it is a measure
of the similarity between two sets. It is defined as the size of the intersection of the sets divided
by the size of the union of the sets. In context of our classification task, it can be represented as:
 ( true,  pred) =
| true ∩  pred|
| true ∪  pred|
where y_true and y_pred represent the ground truth &amp; predicted labels respectively. The
Jaccard Score ranges between 0 and 1 where a score of 1 indicates that the predicted labels
perfectly match the true labels (there are no false positives or false negatives) &amp; 0 means that
there is no overlap between the predicted and true labels, indicating complete dissimilarity.</p>
      </sec>
      <sec id="sec-7-2">
        <title>6.2. Final Outcomes</title>
        <p>Out of all the methods we experimented with, the ones sent to the track include the final
predictions made by the LLM (run 3) and DistilBERT (runs 1 &amp; 2, as mentioned in 5.2.1) on the
test set.</p>
        <p>
          The macro F1[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and Jaccard[9] scores from the scikit-learn library are the metrics used to
evaluate the performance of the model. The scores are computed based on the predicted labels
on the test set. LLM achieved a slightly higher macro F1 score of 0.55, outperforming the
transformer models like DistilBERT, albeit with a narrow margin.
        </p>
        <p>Coming to the class-wise stats, we see that most of the models tend to struggle with two specific
classes, ‘none’ and ‘conspiracy’. Additionally, the ‘country’ and ‘religious’ classes, which have
relatively fewer examples in the dataset, also tend to result in lower model performance.
We will be able to share the results for our other model predictions on the test data once we
have the ground truth labels (which will be revealed after the conference concludes) for the test
set.
7. Limitations of the Study &amp; Future Aspects
• Due to constraints of not having an OpenAI paid subscription, the model was trained
using a subset of only 58 cases randomly chosen (thoughtfully selected in a way such that
examples from all 12 classes were present) from the pre-processed train set. In the future,
a more extensive exploration and understanding of diverse prompting strategies could
yield even better results. It’s important to highlight that LLM has proved it’s efectiveness
in terms of performance on this complex multi-label classification task. This suggests that
with additional resources, there’s a significant potential for further enhancing the model’s
performance.
• The exploration of advanced models, spanning a broader range of epochs and
configurations, was hindered by limitations in GPU resources.
• We observed that the model adhered to the provided concern labels. The structured
prompts, incorporating keywords and notes, proved highly efective in reducing
hallucinations. This approach empowered the model to develop a nuanced understanding of
each concern label, and its responses were logically aligned with this comprehension.
However, there are instances where the model exhibited hallucinations while generating
explanations - false yet credible-sounding content. The fluency and quality of the
generated non-factual content by the models are intriguing &amp; require further study to deduce
patterns and understand the model’s thought process. The findings from (Augenstein et
al., 2023)[10] shed light on similar concerns related to large language model behaviour.
• The incorporation of sentiments into responses, particularly within the domain of large
language models, stands as an unexplored area. While it’s a preliminary investigation, it
provides insights into possible implications and suggests areas for future research.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>We would like to extend our sincere gratitude to the organizers of the Artificial Intelligence
on Social Media (AISoMe) track[11] at the FIRE 2023 conference, for hosting this brilliant &amp;
insightful problem statement that served as a great learning experience. We’re grateful for the
opportunity to engage with it.
[9] Website. URL: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.jaccard_
score.html.
[10] I. Augenstein, T. Baldwin, M. Cha, T. Chakraborty, G. L. Ciampaglia, D. Corney, R. DiResta,
E. Ferrara, S. Hale, A. Halevy, E. Hovy, H. Ji, F. Menczer, R. Miguez, P. Nakov, D. Scheufele,
S. Sharma, G. Zagni, Factuality Challenges in the Era of Large Language Models, 2023.
arXiv:2310.05189.
[11] S. Poddar, M. Basu, K. Ghosh, S. Ghosh, Overview of the FIRE 2023 Track: Artificial
Intelligence on Social Media (AISoMe), in: Proceedings of the 15th Annual Meeting of the
Forum for Information Retrieval Evaluation, 2023.
[12] B. Zhang, D. Ding, L. Jing, How would Stance Detection Techniques Evolve after the
Launch of ChatGPT?, 2023. arXiv:2212.14548.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Poddar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Samad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ganguly</surname>
          </string-name>
          , S. Ghosh,
          <article-title>CAVES: A Dataset to facilitate Explainable Classification and Summarization of Concerns towards COVID Vaccines</article-title>
          ,
          <source>in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>3154</fpage>
          -
          <lpage>3164</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mishra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Qiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.-W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-C.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Tafjord</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kalyan</surname>
          </string-name>
          ,
          <article-title>Learn to explain: Multimodal Reasoning via Thought Chains for Science Question Answering</article-title>
          ,
          <source>in: The 36th Conference on Neural Information Processing Systems (NeurIPS)</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Website</surname>
          </string-name>
          . URL: https://platform.openai.com/docs/models/gpt-3-5.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Website</surname>
          </string-name>
          . URL: https://huggingface.co/distilbert-base-uncased.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Website</surname>
          </string-name>
          . URL: https://huggingface.co/bert-base-uncased.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Read</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pfahringer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Holmes</surname>
          </string-name>
          , E. Frank,
          <article-title>Classifier Chains for Multi-label</article-title>
          <string-name>
            <surname>Classification</surname>
          </string-name>
          ,
          <year>2009</year>
          . URL: https://www.cs.waikato.ac.nz/~eibe/pubs/chains.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Read</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Pfahringer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Holmes</surname>
          </string-name>
          , E. Frank,
          <source>Classifier chains: A Review and Perspectives</source>
          ,
          <year>2020</year>
          . URL: https://doi.org/10.48550/arXiv.
          <year>1912</year>
          .
          <volume>13405</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Website</surname>
          </string-name>
          . URL: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score. html.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>