<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Forum for Information Retrieval Evaluation, December</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>of CryptOQA: Opinion Extraction and Question Answering from CryptoCurrency-Related Tweets and Reddit posts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Kripabandhu Ghosh</string-name>
          <email>kripaghosh@iiserkol.ac.in</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Cryptocurrency, Information Retrieval, Classification, Question Answering, Social Media</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          ,
          <addr-line>Koustav Rudra</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Deloitte</institution>
          ,
          <addr-line>Kolkata, West Bengal</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>IISER Kolkata</institution>
          ,
          <addr-line>Mohanpur, West Bengal</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>IIT Jodhpur</institution>
          ,
          <addr-line>Jodhpur, Rajasthan</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>1</volume>
      <fpage>2</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>Cryptocurrency is an exponentially growing domain, and Twitter and Reddit are important outlets for users to express their opinions, ask questions, or discuss specific topics. The CryptoQA track at FIRE 2024 aims to develop systems that automatically assess cryptocurrency posts on social media. In this track, there are basically two tasks: (a) the classification of posts related to cryptocurrencies into the eight classes, namely, Noise, Objective, Positive, Negative, Neutral, Question, Advertisement, and Miscellaneous, divided across three levels, and (b) the participants had to detect whether the answer is relevant to the question given question-answer pair.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Over the last decade, the emergence of new cryptocurrencies has significantly altered the evolution
of the global economy, triggering complex debates on multiple online platforms. Social networks [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
like Reddit and Twitter are one of the primary sources for public discourse on cryptocurrency market
changes, trends, and technologies, ofering a valuable yet challenging task to analyze big data streams
in the form of text, images, videos, etc, for researchers. A broad spectrum of sentiments [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is posed
on the social media posts related to cryptocurrencies, and these span over various formats such as
questions, enquiries, opinions, advertisements, or objective statements [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        By classifying the sentiments of these posts, individuals and organizations can ofer better insights,
ofering insightful trends about public opinion. Accurate classification of sentiments ranging from
positive and negative to neutral leads to predicting market trends, understanding consumer behaviour,
and shaping marketing strategies [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] efectively. However, the sentiment classification task in this
context attracts many challenges due to its diverse and unstructured nature of social media content.
One of the primary challenges is the inherent variability and ambiguity of social media language, which
often involves abbreviations and slang words. It is dificult for a traditional text classification model to
accurately interpret the nuances of expression in the social media posts that can be short, informal and
slang-laden [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Furthermore, the task gets complicated as the sentiment in which posts can be expressed
      </p>
      <p>CEUR</p>
      <p>
        ceur-ws.org
difers significantly. These problems need to be addressed with advanced models [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] to account for
the nuances of linguistics used in social media postings, thereby providing a refined classification of
sentiments. Therefore, the aim of the proposed research is to develop a system that provides efective
and precise automated monitoring of social media cryptocurrency discussions. The system that was
developed should be able to classify the content in the opinion with fine-grained sentiment, fact, and
opinion, as well as within the fact and opinion noise information. Moreover, the system must also be
able to provide efective replies to questions related to cryptocurrency. In addition to the classification
task, this track focused on addressing other queries and concerns that potential crypto investors may
have, and attempted to determine whether the comment regarding the question was appropriate or not.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Dataset</title>
      <p>There are two sources of datasets, namely, Reddit and Twitter social media posts. Now this dataset
is again divided for classification and QnA tasks respectively. The classification dataset has three
level-annotations:
1. Level 1: In level 1, there are three classes: NOISE, OBJECTIVE, SUBJECTIVE, and these three
classes are marked with 0,1,2, respectively.
2. Level 2: In this level, the SUBJECTIVE class is further divided into three categories: NEUTRAL,</p>
      <p>NEGATIVE, POSITIVE and these are marked with 0,1,2, respectively, in the dataset.
3. Level 3: In the last level, there are four classes, namely, NEUTRAL SENTIMENTS, QUESTIONS,
ADVERTISEMENTS, MISCELLANEOUS, and these are marked with 0,1,2,3, respectively. This set
of classes is branched from the NEUTRAL category in level 2.</p>
      <p>The hierarchical data distribution in Twitter and Reddit datasets is shown in Figure 1.</p>
      <p>The QnA task has a total of 31,614 samples across both data sources (Twitter and Reddit combined).
This dataset is further classified as Relevant or non-relevant, with 25,290 and 6,324 samples from the
training and test sets, respectively.</p>
      <sec id="sec-2-1">
        <title>2.1. Training data statistics</title>
        <p>The distribution of training data for both Twitter and Reddit samples is labeled in three diferent levels
among 8 diverse categories as depicted in Figure 2 and 3, respectively. For Twitter training data, there
are 1,745, 1,553, and 1,689 training samples of the SUBJECTIVE, NOISE, and OBJECTIVE classes in
level 1. Furthermore, this subjective class is divided into three levels, with samples of 363, 69, and
1260 for the positive, negative, and neutral classes, respectively. In level 3, this NEUTRAL category is
further classified as 177, 280, 677, and 59 training samples of NEUTRAL SENTIMENTS, QUESTIONS,
ADVERTISEMENTS, and MISCELLANEOUS classes. Similarly, for Reddit, the distribution of samples
across the NOISE, OBJECTIVE, POSITIVE, NEGATIVE, NEUTRAL, QUESTIONS, ADVERTISEMENTS,
and MISCELLANEOUS classes is 645, 503, 259, 410, 476, 2,390, 212, and 105, respectively.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Test data statistics</title>
        <p>The opinion test dataset contains two sub-datasets for each of the social media platforms (Twitter and
Reddit). The total volume of the test dataset is 1000 texts (500 for each media).</p>
        <p>QnA Dataset: The question-answering (QnA) dataset contains a set of question-answer pairs with
labels marked as Relevant and Not Relevant for the answer being relevant to the question or not,
respectively. The Question labels inspire the QnA corpus in the opinion dataset. The test dataset
contains 6,324 QnA pairs, with 888 pairs considered relevant and 5,436 as non-relevant. Figure 4 shows
the distribution of Reddit and Twitter data.</p>
        <p>(a) Reddit
(b) Twitter</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Task Definition</title>
    </sec>
    <sec id="sec-4">
      <title>4. Participants</title>
      <p>Task 1 is to develop a classification model to classify cryptocurrency related social media posts into
eight classes, namely, Noise, Objective, Positive, Negative, Neutral, Question, Advertisement, Miscellaneous.</p>
      <p>Task 2 required participants to identify all answers relevant to a given question on cryptocurrency.
There are final submissions of four teams from various academic institutions in the CryptOQA shared
task at FIRE 2024, focused on classifying social media posts related to cryptocurrency. The varied
strategies and advanced mathematical models employed by the respective teams to deal with the given
task are mentioned below:
1. Team MUCS (Mangalore University) approaches this challenge with two deep learning models.
(a) Unique_Label_LSTM, which has a unique labeling method for hierarchical classification, and
(b) HCC_LSTM, a hierarchical classifier chain model which uses LSTM [ 7] internally. The latter
model has achieved better performance over the former, reporting macro F1 scores of 0.574 and
0.328 for Twitter and Reddit, respectively.
2. Team TextTitans (IIT Kharagpur) leverages large language models (LLMs), namely, GPT-4-Turbo,
for opinion classification and question-answering tasks. A 64-shot prompting technique is utilized
to categorize social media posts [8]. They reported scores of 0.266 and 0.249 on Twitter and
Reddit, respectively. However, they ranked 1st in the question-answering task with a score of
0.157.
3. Team COM presented two frameworks with transformer models namely, XLM-RoBERTa [9]
for single-level classification and RoBERTa-base for 3-level hierarchical classification. Now, a
RoBERTa is used in each level for the classification task. The posts were first classified as Noise,
Objective, or Subjective at level 1. Subjective posts were further classified into Neutral, Negative,
or Positive sentiments at level 2. Finally, Neutral posts were classified into Neutral-Sentiment,
Questions, Advertisements, or Miscellaneous at level 3. In contrast, the single-level framework
used an XLM-RoBERTa model to categorize posts into one of eight classifications. They obtained
the scores of 0.778 and 0.542 on Twitter and Reddit opinion classification tasks, respectively.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Methodologies</title>
      <p>The submitted solutions across teams participating in the CryptOQA shared task employed a range
of techniques for classifying cryptocurrency-related social media posts. These methodologies can be
broadly categorized into four techniques, namely, transformer-based models, hierarchical classification,
LSTM models [10], and prompt-based learning. To address the challenges imposed, each team adopted
one or more of these techniques.</p>
      <p>Transformer-based Models These models are known for their capacity to capture context and
relationships within text, leveraging attention mechanisms. This approach is a popular choice for most
teams, as it efectively handles the nuanced and domain-specific language often found in
cryptocurrencyrelated social media posts.</p>
      <p>• RoBERTa: This model is heavily pre-trained on robust and large datasets to capture the intricacies
of the text. RoBERTa is used by Team COM.
• XLM-RoBERTa: Team COM developed a multi-label classification framework for single levels
through a fine-tuned XLM-RoBERTa-base model that employs a multilingual transformer model,
which supersets RoBERTa in a hundred languages. This was done to leverage the ability of
XLM-RoBERTa to generalize across a number of data formats.</p>
      <p>Hierarchical Classification Hierarchical classification is the process of creating a tree-like structure
for the classification problem that supports the construction of predictions at diferent granularities and
handles the case of multi-label classification in a simple way. The teams that used this kind of approach
partitioned the data into several levels, where each level processes more detailed diferences.
• Hierarchical Classifier Chain (HCC_LSTM): This strategy was used by Team MUCS, an LSTM
classifier at each level of the hierarchy to manage hierarchical relations.
• 3-level RoBERTa Hierarchical Framework: Team COM has utilized this method to classify posts in
growing depth of detail categories.</p>
      <p>LSTM-based Models Long Short-Term Memory (LSTM) models have the ability to process sequential
data and capture long-term dependencies in text with high relative distance, making them suitable for
tasks where the context of the sentence is crucial.</p>
      <p>• Unique_Label_LSTM: Team MUCS leverages this technique as a unique labeling technique for
hierarchical classification.
• BiLSTM for Question-Answering (QnA): A BiLSTM model was used by Team COM to classify
comments as relevant or non-relevant in the QnA task.</p>
      <p>Prompt-based Learning and Few-shot Techniques Prompts were used in learning, especially
in conjunction with large language models (LLMs) to utilize pre-trained knowledge. This approach
is particularly beneficial in cases with no or a small amount of labeled data, as it directs the model in
formulating answers in the form of a prompt. There are methods, namely zero-shot, where no data is
provided, few-shot, where a limited number of samples are given and many more.
• GPT-4-Turbo with 64-shot Learning: Team TextTitans employed a few-shot learning technique
[11] that allowed the model to classify with very few examples without requiring a significantly
large labeled dataset.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Result</title>
      <p>The results from the task CryptOQA highlight that transformer-based models are superior in categorizing
social media posts related to cryptocurrency. Team COM had the best macro F1 scores for Task 1
where they attained 0.778 for Twitter and 0.542 for Reddit using an XLM-RoBERTa based single level
classification framework.</p>
      <p>Team MUCS used hierarchical classification models, which proved to be eficient as well, but Team
COM’s hierarchical model was only a fraction as successful as their single-level model. Lastly, the
TextTitans team reported F1-scores of 0.266 and 0.249 on the Twitter and Reddit datasets, respectively.
COM</p>
      <sec id="sec-6-1">
        <title>MUCS</title>
      </sec>
      <sec id="sec-6-2">
        <title>TextTitans</title>
      </sec>
      <sec id="sec-6-3">
        <title>Tasks</title>
      </sec>
      <sec id="sec-6-4">
        <title>Task 1</title>
      </sec>
      <sec id="sec-6-5">
        <title>Task 2</title>
      </sec>
      <sec id="sec-6-6">
        <title>Task 1</title>
      </sec>
      <sec id="sec-6-7">
        <title>Task 2</title>
      </sec>
      <sec id="sec-6-8">
        <title>Task 1</title>
      </sec>
      <sec id="sec-6-9">
        <title>Task 2</title>
        <p>For the Question and Answering scenario in task 2, the findings are more varied. Team TextTitans
reported the highest score of 0.157 when using a prompt-based approach on the GPT4-Turbo model.
However, the performance of other teams, including Team COM’s BiLSTM model, was notably lower,
with a score of 0.146. In general, models based on transformers were the most successful in Task 1,
achieving higher scores, while prompt-based techniques showed their dominance in QnA tasks. All the
contributions are listed in Table 1.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>The goal of the CryptOQA task is to evaluate the post classification and QnA tasks related to
cryptocurrencies using ML and NLP techniques. Within the submissions, transformer-based approaches, such as
RoBERTa and XLM-RoBERTa, consistently yield better results. The use of transformers yielded the
highest scores across the Twitter and Reddit datasets, with Team COM topping the performance using
RoBERTa for single-layer classification. Sensitive to those changes were the hierarchical classification
models, which still performed poorly compared to their single-level counterparts. Approaches based on
prompt learning methods, especially the few-shot models, worked well in the QnA task. Overall, these
ifndings are important as a basis for further development in the rapidly evolving cryptocurrency market.
Declaration on Generative AI
During the preparation of this work, the author(s) used Grammarly in order to: Grammar and spelling
check, and reword. After using this tool/service, the author(s) reviewed and edited the content as
needed and take(s) full responsibility for the publication’s content.
[7] N. Aslam, F. Rustam, E. Lee, P. B. Washington, I. Ashraf, Sentiment analysis and emotion
detection on cryptocurrency related tweets using ensemble lstm-gru model, IEEE Access 10 (2022)
39313–39324. doi:10.1109/ACCESS.2022.3165621.
[8] P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, G. Neubig, Pre-train, prompt, and predict: A systematic
survey of prompting methods in natural language processing, ACM Comput. Surv. 55 (2023). URL:
https://doi.org/10.1145/3560815. doi:10.1145/3560815.
[9] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott,
L. Zettlemoyer, V. Stoyanov, Unsupervised cross-lingual representation learning at scale, 2020.</p>
      <p>URL: https://arxiv.org/abs/1911.02116. arXiv:1911.02116.
[10] X. Huang, W. Zhang, X. Tang, M. Zhang, J. Surbiryala, V. Iosifidis, Z. Liu, J. Zhang, Lstm based
sentiment analysis for cryptocurrency prediction, 2021. URL: https://arxiv.org/abs/2103.14804.
arXiv:2103.14804.
[11] Z. Li, S. Fan, Y. Gu, X. Li, Z. Duan, B. Dong, N. Liu, J. Wang, Flexkbqa: A flexible llm-powered
framework for few-shot knowledge base question answering, 2024. URL: https://arxiv.org/abs/
2308.12060. arXiv:2308.12060.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H. N. Durmuş</given-names>
            <surname>Şenyapar</surname>
          </string-name>
          ,
          <article-title>Cryptocurrency on social media: Analyzing the digital discourse towards the coin market 9 (</article-title>
          <year>2024</year>
          )
          <fpage>202</fpage>
          -
          <lpage>223</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K. I.</given-names>
            <surname>Roumeliotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. D.</given-names>
            <surname>Tselikas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. K.</given-names>
            <surname>Nasiopoulos</surname>
          </string-name>
          ,
          <article-title>Llms and nlp models in cryptocurrency sentiment analysis: A comparative classification study</article-title>
          ,
          <source>Big Data and Cognitive Computing</source>
          <volume>8</volume>
          (
          <year>2024</year>
          ). URL: https://www.mdpi.com/2504-2289/8/6/63. doi:
          <volume>10</volume>
          .3390/bdcc8060063.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Nizzoli</surname>
          </string-name>
          ,
          <string-name>
            <surname>LEVERAGING SOCIAL MEDIA AND AI TO FOSTER SECURE SOCIETIES AGAINST ONLINE AND OFFLINE THREATS</surname>
          </string-name>
          ,
          <source>Ph.D. thesis</source>
          ,
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .13140/RG.2.2.29807.97446.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Oikonomopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Tzafilkou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Karapiperis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Verykios</surname>
          </string-name>
          ,
          <article-title>Cryptocurrency price prediction using social media sentiment analysis</article-title>
          ,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . doi:
          <volume>10</volume>
          .1109/IISA56318.
          <year>2022</year>
          .
          <volume>9904351</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          ,
          <year>2019</year>
          . URL: https://arxiv.org/abs/
          <year>1908</year>
          .10084. arXiv:
          <year>1908</year>
          .10084.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.-H.</given-names>
            <surname>Shin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. G.</given-names>
            <surname>Choi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lim</surname>
          </string-name>
          ,
          <article-title>A deep learning-based cryptocurrency price prediction model that uses on-chain data</article-title>
          ,
          <source>IEEE Access 10</source>
          (
          <year>2022</year>
          )
          <fpage>56232</fpage>
          -
          <lpage>56248</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2022</year>
          .
          <volume>3177888</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>