<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the FIRE 2022 track: Information Retrieval from Microblogs during Disasters (IRMiDis)⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Soham Poddar</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Moumita Basu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Saptarshi Ghosh</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kripabandhu Ghosh</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Amity University</institution>
          ,
          <addr-line>Kolkata</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Indian Institute of Science Education and Research</institution>
          ,
          <addr-line>Kolkata</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Indian Institute of Technology</institution>
          ,
          <addr-line>Kharagpur</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Microblogging sites such as Twitter play an important role in dealing with various mass emergencies including natural disasters and pandemics. The FIRE 2022 track on Information Retrieval from Microblogs during Disasters (IRMiDis) focused on two important tasks - (i) to detect the vaccine-related stance of tweets related to COVID-19 vaccines, and (ii) to detect reporting of COVID-19 symptom in tweets. Microblogging social media sites like Twitter are often used to gather real-time information about events happening in the real-world, such as during disasters and pandemics. The 'Information Retrieval from Microblogs during Disasters' (IRMiDis) series of shared tasks aims to provide datasets and shared tasks for development of IR/NLP/ML techniques that can be utilized for better management of such crises situations. In particular, during pandemics such as COVID-19 where complete vaccination is the longterm solution to fight against the disease, social media can be utilized to understand public sentiments towards vaccines [1, 2]. During such pandemics, social media can also be utilized to gain real time insights about symptoms being reported by people, which can be used to track the spread of the disease [3]. To address the two aforementioned important problems, we ofered two shared tasks in the FIRE 2022 IRMiDis track. Task 1: COVID vaccine-stance detection: The only long-term remedy for the COVID-19 pandemic seems to be through society-scale vaccination. However, quite a few people are skeptical about the use of vaccines owing to various reasons, including the politics involved and</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Twitter</kwd>
        <kwd>microblogs</kwd>
        <kwd>COVID-19</kwd>
        <kwd>vaccine stance</kwd>
        <kwd>tweet classification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Pro-Vax</p>
      <sec id="sec-1-1">
        <title>Neutral 245 229 244</title>
        <p>the fact that vaccines have been rushed into production. It is important to understand public
sentiments towards vaccines, and social media can be used to gain a lot of data quickly about
people talking about vaccines.</p>
        <p>Building an efective classifier to predict the user-stance (towards vaccines) from social media
posts (e.g., microblogs) becomes a crucial first step in any kind of analysis towards vaccine
stance. In this task, the participants need to develop automated methods to identify the stance
of a tweet (actually, of the user posting the tweet) towards COVID-19 vaccines. Here the tweets
are to be classified into three classes – Anti-Vax (against vaccines), Pro-Vax (supports vaccines)
and Neutral.</p>
        <p>Task 2: Detection of COVID-19 symptom-reporting in tweets: Quickly identifying people
who are experiencing COVID-19 symptoms is important for authorities to arrest the spread of
the disease. In this task, we specifically explore if tweets that report about someone experiencing
COVID-19 symptoms (e.g., ‘fever’, ‘cough’) can be automatically identified. We call such tweets
“symptom-reporting tweets”.</p>
        <p>Note that, simply identifying tweets that contain mentions of COVID-19 symptoms is not
helpful, since these tweets can contain lots of irrelevant information. For instance, a tweet
mentioning “weekend football fever” contains the symptom-word “fever” but is clearly not
a symptom-reporting tweet. Again, a tweet giving just general information about potential
symptoms of COVID-19 is not a symptom-reporting tweet. In fact, our analyses showed that a
very large majority of tweets that include COVID-symptom words are not symptom-reporting
tweets, i.e., these tweets do not inform about some person experiencing COVID-19 symptoms.
Thus it is important to build an efective classifier to understand which tweets actually inform
about someone experiencing COVID-19 symptoms.</p>
        <p>In this task, participants need to develop a 4-class classifier on tweets that can detect tweets
that report someone experiencing COVID-19 symptoms. The 4 classes are based on who’s
reporting the symptoms – (i) Primary Reporting (self-reporting), (ii) Secondary Reporting
(reporting for friends or family members), (iii) Third-party Reporting (reporting for celebrity),
(iv) Non-Reporting (tweets which do not report about anyone experiencing COVID-19
symptoms).</p>
        <sec id="sec-1-1-1">
          <title>Team Id</title>
        </sec>
        <sec id="sec-1-1-2">
          <title>Method</title>
        </sec>
        <sec id="sec-1-1-3">
          <title>Accuracy</title>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>Data@IITD</title>
      </sec>
      <sec id="sec-1-3">
        <title>IREL</title>
      </sec>
      <sec id="sec-1-4">
        <title>Team-NISER</title>
      </sec>
      <sec id="sec-1-5">
        <title>Thapar</title>
        <p>grammers</p>
      </sec>
      <sec id="sec-1-6">
        <title>DataWiz</title>
      </sec>
      <sec id="sec-1-7">
        <title>GAFA</title>
      </sec>
      <sec id="sec-1-8">
        <title>AmiTechies</title>
      </sec>
      <sec id="sec-1-9">
        <title>NLP Learners</title>
      </sec>
      <sec id="sec-1-10">
        <title>UBCS</title>
      </sec>
      <sec id="sec-1-11">
        <title>Vishal Nair</title>
      </sec>
      <sec id="sec-1-12">
        <title>Infrared IR</title>
      </sec>
      <sec id="sec-1-13">
        <title>Subinay</title>
      </sec>
      <sec id="sec-1-14">
        <title>IISERK SSN_NLP</title>
      </sec>
      <sec id="sec-1-15">
        <title>Fine tuned CT-BERT 0.770</title>
        <p>CT-BERT 0.582</p>
      </sec>
      <sec id="sec-1-16">
        <title>Pre-trained BERT with data augmentation tech- 0.517 niques</title>
      </sec>
      <sec id="sec-1-17">
        <title>Pro- Fine Tuned BERT 0.529</title>
      </sec>
      <sec id="sec-1-18">
        <title>XGBoost with Count Vectorizer</title>
      </sec>
      <sec id="sec-1-19">
        <title>Fine tuned BERT</title>
      </sec>
      <sec id="sec-1-20">
        <title>TF-IDF Vectorizer and Bernoulli Naïve Bayes</title>
      </sec>
      <sec id="sec-1-21">
        <title>TF-IDF Vectorizer and Random Forest</title>
      </sec>
      <sec id="sec-1-22">
        <title>TF-IDF Vectorizer and Multinomial Naïve Bayes</title>
      </sec>
      <sec id="sec-1-23">
        <title>TF-IDF vectorization and MLP Classifier</title>
        <p>Doc2Vec and SVM
TF-IDF Vectorizer and SVM
ensemble of xlmroberta, roberta and albert</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Task1: COVID-19 Vaccine Stance Classification from Tweets</title>
      <p>
        The datasets: The training dataset comprised of a set of 2,792 tweets from the dataset provided
by Cotfas et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] that contains COVID vaccine-related tweets posted during Nov-Dec 2020.
The tweets are labeled as Anti-Vax, Pro-Vax or Neutral.
      </p>
      <p>We also collected tweets using various COVID-related keywords, including generic keywords
(e.g. ‘vaccine’, ‘vaxxer’), names of COVID-19 vaccines or their manufacturers (e.g. ‘pzfier’,
‘covaxin’), and so on. We randomly selected 2,400 distinct tweets from all the tweets posted during
March–December 2020. We got each of these tweets labelled into the three classes (Anti-vax,
Provax, Neutral) by three annotators on the Prolific crowdsourcing platform (https://prolific.co/).
There was at least majority agreement for 2,321 tweets (out of the 2,400), i.e., at least 2 out of
the 3 annotators provided the same label. 718 tweets from this were were provided as the test
dataset and the rest were added to the training dataset.</p>
      <p>
        Table 1 states the distribution of classes in the test dataset along with an example tweet in
each class. More details about the data collection and annotation process can be found in the
prior work [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Methods: In Task 1 we received participation from 13 teams, and 32 runs were submitted.
All the teams used common NLP pre-processing techniques and a variety of classification
strategies that includes traditional classifiers (such as Multinomial Naïve Bayes and Support
Vector Machines), Neural network-based classifiers (MLP), ensemble methods (such as Random
Forest) and transformer based pre-trained models such as BERT and Covid-Twitter-BERT [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
(abbreviated as CT-BERT). The summary of the methods is depicted in Table 2.
      </p>
      <sec id="sec-2-1">
        <title>Method</title>
        <sec id="sec-2-1-1">
          <title>Fine-tuned CT-BERT</title>
        </sec>
        <sec id="sec-2-1-2">
          <title>Pro- Fine-tuned BERT</title>
          <p>Results: We considered two standard classification metrics over the test set – Accuracy and
macro-F1 Score (primary metric). Table 2 ranks the submitted runs in decreasing order of the
primary metric. We observed recent transformer based models like CT-BERT have outperformed
all the other models.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Task 2: Detection of COVID-19 Symptom-Reporting in Tweets</title>
      <p>
        The datasets: We crawled English tweets from February 2020 - June 2021 using keywords
related to COVID-19 symptoms (e.g., ‘fever’, ‘cough’). This list of symptoms were compiled
from the list of symptoms of COVID-19 given by WHO 1 and by Sarker et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. We took a
random sample from our collected set of tweets and got about 2K tweets annotated into the
four classes by human workers. The four classes were Primary, Secondary, Third-party and
Non-Reporting (as explained in the Introduction). We split this annotated set of tweets in the
ratio 80%-20% and released them as train and test sets. An examples tweet from each of these
classes is given in Table 1, along with the distribution of tweets in the test dataset.
Methods: In Task 2, 6 teams participated and 6 runs were submitted. Participating teams
employed similar kind of methods that they used in Task 1. The summary of methods is
illustrated in Table 4.
1https://www.who.int/emergencies/diseases/novel-coronavirus-2019/question-and-answers-hub/q-a-detail/
coronavirus-disease-covid-19
Results: We considered two standard classification metrics over the test set – Accuracy and
macro-F1 Score (primary metric). Table 4 ranks the submitted runs in decreasing order of the
primary metric.
The FIRE 2022 IRMiDis track compared the performance of various methods for (i) detecting
the stance of the community regarding COVID vaccines, (ii) detecting symptom-reporting
COVID-19 tweets. We hope that the test collections developed in this track will be utilized by
the research community in the development of better models for both the tasks in future.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments References</title>
      <p>The track organizers thank all the participants for their interest in this track, and the FIRE
authorities for their support in running the track.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Poddar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mondal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Misra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ganguly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <article-title>Winds of change: Impact of covid-19 on vaccine-related opinions of twitter users</article-title>
          ,
          <source>in: Proceedings of the International AAAI Conference on Web and Social Media</source>
          , volume
          <volume>16</volume>
          , AAAI Press,
          <year>2022</year>
          , pp.
          <fpage>782</fpage>
          -
          <lpage>793</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.-A.</given-names>
            <surname>Cotfas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Delcea</surname>
          </string-name>
          , I. Roxin,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ioanăş</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. S.</given-names>
            <surname>Gherai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tajariol</surname>
          </string-name>
          ,
          <article-title>The longest month: analyzing covid-19 vaccination opinions dynamics from tweets in the month following the ifrst vaccine announcement</article-title>
          ,
          <source>Ieee Access</source>
          <volume>9</volume>
          (
          <year>2021</year>
          )
          <fpage>33203</fpage>
          -
          <lpage>33223</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sarker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lakamana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hogg-Bremer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Al-Garadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-C.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <article-title>Self-reported covid-19 symptoms on twitter: an analysis and a research resource</article-title>
          ,
          <source>Journal of the American Medical Informatics Association</source>
          <volume>27</volume>
          (
          <year>2020</year>
          )
          <fpage>1310</fpage>
          -
          <lpage>1315</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Salathé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Kummervold</surname>
          </string-name>
          ,
          <article-title>Covid-twitter-bert: A natural language processing model to analyse covid-19 content on twitter</article-title>
          , arXiv preprint arXiv:
          <year>2005</year>
          .
          <volume>07503</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>