<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Transformer-Based Multi-Task Text Classification of Cryptocurrency Social Media Text with DeBERTaV3-Small</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>V. T. Rushi Kannan</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sara Renjit</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Indian Institute of Information Technology Kottayam</institution>
          ,
          <addr-line>Kerala</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Koneru Lakshmaiah University</institution>
          ,
          <addr-line>Aziz Nagar, Telangana</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>We present a transformer-based multi-task framework for cryptocurrency discussions: hierarchical subjectivity classification across YouTube, Twitter, and Reddit, and Reddit query-answer relevance prediction. Built on microsoft/deberta-v3-small and enhanced with Focal Loss, Dice Loss, Label Smoothing, and Supervised Contrastive Learning, it addresses class imbalance and improves representation learning. In the Forum for Information Retrieval Evaluation (FIRE 2025) shared task on Opinion Extraction and Question Answering from Cryptocurrency-Related Tweets and Reddit Posts, our system achieved 83.83% validation accuracy and 78.32% Macro-F1 in Level 1, with consistent relevance prediction Macro-F1 near 70%. Oficial evaluation confirmed that our system ranked top on the hidden test sets, attaining a Macro-F1 of 1.0 and showing robust generalization. This underscores the efectiveness of hierarchical modeling, fold-based ensembling, and advanced loss functions for opinion mining and relevance detection in noisy cryptocurrency discussions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Transformers</kwd>
        <kwd>Subjectivity Analysis</kwd>
        <kwd>Relevance Prediction</kwd>
        <kwd>DeBERTaV3-small</kwd>
        <kwd>Supervised Contrastive Learning</kwd>
        <kwd>Focal Loss</kwd>
        <kwd>Dice Loss</kwd>
        <kwd>Label Smoothing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        User-generated content on platforms such as Reddit, Twitter, and YouTube has opened new opportunities
for NLP, while also introducing significant challenges [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. In the context of cryptocurrency, online
discussions are often characterized by informal language, domain-specific jargon, sarcasm, and rapidly
evolving terminology [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. These properties make automated analysis particularly valuable for
understanding sentiment, market behavior, and the spread of misinformation [
        <xref ref-type="bibr" rid="ref5 ref6">5, 6</xref>
        ].
      </p>
      <p>
        This study was carried out as part of the Forum for Information Retrieval Evaluation (FIRE 2025)
CryptoQA shared task, “Opinion Extraction and Question Answering from Cryptocurrency-Related Tweets
and Reddit Posts” [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which provides the dataset, task definitions, and evaluation framework.
      </p>
      <p>
        To address the challenges of noisy and imbalanced data, we propose a modular framework based on
DeBERTaV3-small [
        <xref ref-type="bibr" rid="ref8">8, 9</xref>
        ]. Our approach integrates Dice Loss [10], Label Smoothing [11], Focal Loss [12],
and supervised contrastive learning [13, 14], supported by stratified cross-validation to reduce overfitting.
Experiments demonstrate competitive, and in some cases state-of-the-art, performance, highlighting
the efectiveness of hierarchical modeling and domain-aware fine-tuning in this high-variance setting.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Transformer-based architectures have become the foundation of modern NLP, largely replacing recurrent
and convolutional models through the use of self-attention for eficient sequence modeling [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Building
on this paradigm, pretrained language models such as BERT [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and T5 [15] have achieved strong
generalization across a wide range of NLP tasks. Later advances like DeBERTa [9] and its improved
variant DeBERTaV3 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] introduced disentangled attention and embedding sharing, which further
enhanced contextual representations. Given this balance between accuracy and eficiency, we selected
deberta-v3-small as the backbone encoder in our framework.
      </p>
      <p>
        For hierarchical subjectivity classification, prior work has shown that cascaded architectures can
improve decision-making by progressively narrowing the classification space [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. A key challenge in
these multi-stage pipelines, however, is class imbalance, which often biases models toward majority
classes. To mitigate this, researchers have explored specialized loss functions: Focal Loss to emphasize
hard-to-classify examples [12], Dice Loss to directly optimize overlap-based metrics [10], and Label
Smoothing to improve calibration and generalization [11]. In parallel, contrastive learning has emerged
as an efective technique for enhancing representation quality. Originally popular in computer
vision [13], contrastive methods have since been adapted to NLP through models like SimCSE [16] and
extended to supervised settings for fine-tuning large language models [14, 16].
      </p>
      <p>For query–comment relevance prediction, earlier work in semantic similarity and information
retrieval has demonstrated that transformer encoders can efectively capture contextual alignment
between text pairs. In addition, optimization strategies such as threshold tuning for macro-F1 have
proven useful for handling skewed class distributions.</p>
      <p>
        Cryptocurrency-related discourse introduces further complexity due to its evolving jargon, sarcastic
tone, and frequent misinformation [
        <xref ref-type="bibr" rid="ref4">4, 17</xref>
        ]. To address these challenges, specialized datasets like
CryptOpiQA [18] have enabled systematic evaluation of sentiment, intent, and relevance modeling
in this domain. Our work builds on these foundations by combining hierarchical modeling,
domainadapted transformers, advanced loss functions, and contrastive learning into a unified framework for
opinion mining and relevance detection in cryptocurrency-focused social media.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Proposed Methodology</title>
      <p>Our approach is intentionally flexible and domain-agnostic, allowing it to be applied to a wide range of
multi-level text classification problems. It integrates a compact transformer backbone with a sequential
(hierarchical) decision process, complementary loss terms for handling imbalance and improving
representation quality, and a robust evaluation protocol. The complete architecture is presented in
Figure 1.</p>
      <sec id="sec-3-1">
        <title>3.1. Overall Framework</title>
        <p>
          The following subsections (3.2–3.5) expand on these components in detail, grouped into preprocessing,
model design, training strategy, and training setup. The system operates as a modular pipeline that
converts raw text into structured predictions through the stages presented in Figure 1:
1. Preprocessing: Normalize and tokenize raw text to form model-ready inputs.
2. Transformer encoding: Extract contextual embeddings using a pretrained DeBERTaV3-small
encoder [
          <xref ref-type="bibr" rid="ref8">9, 8</xref>
          ].
3. Task-specific heads: Attach lightweight classification heads for each level of the hierarchy.
4. Loss computation: Combine focal-like calibration and overlap-aware objectives with supervised
contrastive fine-tuning [12, 10, 13, 14].
5. Evaluation: Assess performance using complementary quantitative metrics and qualitative
embedding inspection [19, 20].
        </p>
        <p>In operation, the pipeline first performs minimal preprocessing, including basic textual cleaning
and tokenization. Token-level inputs are fed into the DeBERTaV3-small transformer encoder to obtain
contextualized sequence representations. The pooled output (from the [CLS] token) is passed to a
task-specific classification head, which produces the final prediction. Figure 1 illustrates the complete
pipeline and a close-up of a single encoder block.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Preprocessing pipeline</title>
        <p>Preprocessing is intentionally kept minimal to preserve the original structure and meaning of the text.
The same procedure is applied to all data sources (Reddit, Twitter, YouTube):
• Merge datasets into a single unified corpus.
• Standardize column names across sources.
• Concatenate relevant text fields with platform-specific tags.
• Retain HTML tokens where present.</p>
        <p>• Adjust sequence lengths to a fixed maximum for eficient batching.</p>
        <p>No additional cleaning, stemming, or data augmentation is applied.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Model design</title>
        <sec id="sec-3-3-1">
          <title>3.3.1. Transformer backbone</title>
        </sec>
        <sec id="sec-3-3-2">
          <title>3.3.2. Classification heads</title>
          <p>The core of the model is the microsoft/deberta-v3-small encoder, chosen for its balance between
performance and computational eficiency.</p>
          <p>A lightweight linear layer projects the pooled sequence embedding into the output space, followed
by an activation function suited to the task. Heads can be trained jointly in a multi-task setting or
independently.</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Training strategy</title>
        <p>To handle class imbalance and label noise, a composite loss function is employed. This combines
focal loss, supervised contrastive loss, Dice loss, and label smoothing. The total loss is calculated as a
weighted sum, with weights tuned for stability and performance.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Training setup and hyperparameters</title>
        <p>Most training settings are shared across tasks, with only a few parameters difering slightly. Key
components include the base model (microsoft/deberta-v3-small), optimizer (AdamW), learning
rate scheduling (CosineAnnealingLR with warmup), mixed precision training (AMP), and training for up
to 5 epochs with early stopping. The complete set of standard hyperparameters used across experiments
is summarized in Table 1. Any task-specific variations (e.g., batch size, input length, or loss function)
are detailed in the case study section.</p>
        <sec id="sec-3-5-1">
          <title>3.5.1. Cross-validation Strategy</title>
          <p>To ensure robust and unbiased training and evaluation, we employ stratified k-fold cross-validation
with  = 5. This approach preserves the class distribution across folds, which is crucial given the
hierarchical and imbalanced nature of the data. For each fold, the model is trained on 80% of the data
and validated on the remaining 20%. Final performance metrics are computed by averaging results
across all folds, providing a more reliable estimate of model generalization.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Case Study</title>
      <sec id="sec-4-1">
        <title>4.1. Dataset</title>
        <p>The dataset used in this study was released as part of the FIRE 2025 Shared Task on
CryptocurrencyRelated Social Media Analysis. It comprised two distinct components:
• Task 1: Hierarchical opinion classification of cryptocurrency-related posts.</p>
        <p>• Task 2: Query–comment relevance prediction in online discussion threads.</p>
        <p>The posts were sourced primarily from Twitter and Reddit, with a smaller portion from YouTube.
These sources reflected a range of communication styles, from short promotional snippets to emotionally
charged opinions and technical discussions. Although the dataset was multilingual, only the English
subset was used for this work.</p>
        <p>In Task 1, each post was annotated across three hierarchical levels, whereas in Task 2,
question–comment pairs were labeled as either relevant or not relevant.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Task 1 — Hierarchical Opinion Classification</title>
        <p>Task 1 categorizes cryptocurrency-related social media posts according to a three-level hierarchy (see
Figure 2):</p>
        <p>1. Level 1 — Content Type: Noise, Objective, or Subjective. Example: “Buy cheap crypto coins
here!!!” → Noise (spam-like, non-informative). Filtering at this stage reduces spam and irrelevant
content, which has been shown to improve opinion mining accuracy.
2. Level 2 — Sentiment Polarity (only for Subjective posts): Positive, Negative, or Neutral.</p>
        <p>Example: “Crypto is a scam, I lost everything” → Negative Subjective (strong negative sentiment).
3. Level 3 — Communicative Intent (only for Neutral Subjective posts): Neutral Statement,
Question, Advertisement, or Miscellaneous. Example: “Will Ethereum outperform Bitcoin in 2025?”
→ Neutral Question.</p>
        <p>This sequential hierarchy improves interpretability and progressively filters content for finer
classification.</p>
        <p>
          Preprocessing: All content from Reddit, Twitter, and YouTube was standardized into a single text
ifeld. For Reddit, the title, selftext, and main columns were concatenated. For Twitter, the tweet
column was used directly, and for YouTube, the comment column was used. A platform-specific tag
([REDDIT], [TWITTER], or [YOUTUBE]) was prepended to encode the source. We intentionally did
not strip HTML tags, as the DeBERTa-v3-small model can leverage raw text including markup [
          <xref ref-type="bibr" rid="ref2">2, 9</xref>
          ].
This unified dataset enabled joint training without separate models [ 16], while platform tags preserved
linguistic cues and reduced bias [12].
        </p>
        <p>This preprocessing stage forms the foundation of our hierarchical opinion classification framework,
and its role in the overall workflow is shown in Figure 3.</p>
        <p>Filtering for Training: From Level 1 to Level 2, only samples predicted as Subjective were passed
to sentiment classification. From Level 2 to Level 3, only Neutral Subjective samples were retained for
communicative intent classification.</p>
        <p>Model and Training Setup: We adopted DeBERTaV3-small for its balance of eficiency and
accuracy. A 5-fold stratified cross-validation ensured robustness, and final Task 1 results were computed
by averaging predictions across folds in an ensemble.</p>
        <sec id="sec-4-2-1">
          <title>Loss Strategy:</title>
          <p>loss:
where  1 = 0.3 and  2 = 0.2. Here:</p>
          <p>Level 1 used Focal Loss to address class imbalance. Levels 2 and 3 used a combined
total = focal +  1sup_con +  2dice,
(1)
• Focal Loss — focuses on hard misclassified samples.
• Supervised Contrastive Loss — improves embedding separability.</p>
          <p>• Dice Loss — optimizes label overlap in imbalanced settings.</p>
          <p>Label smoothing ( = 0.1) further reduced overconfidence.</p>
          <p>Intermediate Outputs and Pipeline Flow: After each level, predicted class probabilities and labels
were saved as .pkl files. These served as the filtered training set for the subsequent level, ensuring
end-to-end consistency across the hierarchical pipeline.
In this task, we aim to determine whether a Reddit comment genuinely addresses the question it follows.
This is not as straightforward as matching keywords — as shown in Figure 4, many replies weave
in sarcasm, wander of-topic, or present misleading information. Correctly identifying such cases as
irrelevant is key to ensuring the quality of the relevance predictions.The relevance task was treated as
binary classification, with labels 0 = Irrelevant and 1 = Relevant.</p>
          <p>Data Setup For this task, each input instance was constructed by concatenating multiple textual
ifelds from the Reddit dataset in the following order:</p>
          <p>Input Text = title + selftext + [MAIN] + comment.</p>
          <p>This representation ensures that the model receives the post’s title and self-description before the target
comment, separated by a special [MAIN] token to mark the transition from query to comment. Given
the substantial class imbalance—where irrelevant comments greatly outnumber relevant ones—we
adopted a class-weighted Focal Loss [12] to mitigate bias towards the majority class, rather than relying
on oversampling or synthetic augmentation. The complete architecture used for this classification task
is shown in Figure 5,</p>
          <p>This shows how the encoded representation of the concatenated input flows through the model’s
components to produce the final relevance prediction.</p>
          <p>Model Configuration: We fine-tuned the microsoft/deberta-v3-small model [9] with a
binary classification head. The focal loss hyperparameters were set to  = 0.75 and  = 1.5 , following
recommendations in [12]. Optimization was performed using the AdamW algorithm with weight decay,
combined with a cosine learning rate scheduler incorporating warmup phases for stable convergence.</p>
          <p>Training employed 5-fold stratified cross-validation to ensure robustness across balanced splits.
Automatic Mixed Precision (AMP) was used to improve computational eficiency. Unlike Task 1, no
model ensembling was applied.</p>
          <p>
            Model Architecture: Figure 5 illustrates the model pipeline: the concatenated input text is tokenized
and passed through the DeBERTa transformer backbone [
            <xref ref-type="bibr" rid="ref2">2, 9</xref>
            ], followed by a task-specific classification
head to predict the relevance score.
          </p>
          <p>This approach enables efective learning from combined multi-source textual fields while handling
class imbalance through focal loss, thereby improving the model’s focus on dificult, minority-class
examples.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Discussion</title>
      <sec id="sec-5-1">
        <title>5.1. Task 1: Hierarchical Classification Results Summary</title>
        <p>The hierarchical classification pipeline was evaluated on 1,429 validation samples from Reddit (500),
YouTube (500), and Twitter (429). It consists of three levels, with the detailed metrics for each platform
presented in Table 2, and the corresponding confusion matrices (CMs) and UMAP visualizations shown
in Figure 6.</p>
        <p>• Level 1 (Subjectivity Detection) achieved an overall accuracy of 83.8% and a Macro F1 score
of 0.78. Reddit and YouTube performed best at this level, with accuracies of 89.4% and 90.8%
respectively, while Twitter lagged behind at 69.2%.
• Level 2 (Sentiment Classification) processed subjective posts and reached an overall accuracy
of 81.9% and a Macro F1 score of 0.64. Reddit and YouTube maintained strong performance (83.4%
and 82.5% accuracy), while Twitter scored lower at 75.2%.
• Level 3 (Intent Classification) was the most challenging, showing an overall accuracy of 76.0%
and a Macro F1 score of 0.41. Reddit and YouTube achieved 80.1% and 88.9% accuracy, but Twitter
dropped to 23.5% due to noisy and short content as well as limited class diversity.</p>
        <p>Overall, the pipeline demonstrates robust performance on Reddit and YouTube across all levels, while
Twitter’s results reflect inherent platform challenges afecting classification quality.
(a) Reddit Level 1 CM</p>
        <p>Minor misclassifications
between Subjective and others.</p>
        <p>(b) Reddit Level 1 UMAP
Visualization
Good class separation; clean
cluster boundaries.</p>
        <p>(c) Reddit Level 2 CM</p>
        <p>Neutral dominates; Positive
and Negative often confused.
(d) Reddit Level 2 UMAP
Visualization
Dense Neutral clustering;
minority classes spread and
overlap.</p>
        <p>(e) Reddit Level 3 CM</p>
        <p>Heavy bias toward Questions;
Ads and Miscellaneous
ignored.</p>
        <p>(f) Reddit Level 3 UMAP
Visualization
Sparse class points blend into
dominant clusters.
(g) Twitter Level 1 CM</p>
        <p>Noticeable confusion between
Subjective and Objective posts.</p>
        <p>(h) Twitter Level 1 UMAP</p>
        <p>Overlapping clusters with less
clear boundaries.</p>
        <p>(i) Twitter Level 2 CM</p>
        <p>Positive and Negative
sentiments confused with Neutral.
(j) Twitter Level 2 UMAP</p>
        <p>Dense Neutral cluster with
scattered minority classes.</p>
        <p>(k) Twitter Level 3 CM</p>
        <p>Sparse classes and noisy data
cause misclassifications.</p>
        <p>(l) Twitter Level 3 UMAP</p>
        <p>Sparse class points overlap
dominant clusters.
(m) YouTube Level 1 CM</p>
        <p>Clear distinction for
Subjective and Noise classes.</p>
        <p>(n) YouTube Level 1 UMAP</p>
        <p>Clear class separation suggests
semantic coherence.</p>
        <p>(o) YouTube Level 2 CM</p>
        <p>Some confusion between
Neutral and Negative classes.
(p) YouTube Level 2 UMAP</p>
        <p>Strong separation even with
class imbalance.</p>
        <p>(q) YouTube Level 3 CM</p>
        <p>High alignment with dominant
ground truth classes.</p>
        <p>(r) YouTube Level 3 UMAP</p>
        <p>Strong cluster cohesion for key
classes.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2 Task 2: Relevance Prediction across Cross-validation Performance</title>
        <p>Based on the cross-validation results presented in Table 3, Fold 2 was selected for the final submission.
Although its accuracy (0.8819) was slightly below the mean, it achieved the highest Macro F1 score (0.7002)
among all folds, which was prioritized as the primary evaluation metric. The confusion matrix for Fold 2 is
shown in Figure 7. Due to time constraints, ensemble learning techniques—such as model averaging or
stacking across all five folds—were not explored. Such methods could potentially improve performance
beyond relying on a single fold model.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3 FIRE 2025 Task 1 and Task 2 Test Evaluation</title>
        <p>While our framework showed competitive performance during validation, it also exhibited some
limitations. In Task 1, validation accuracy was comparatively lower for the Twitter subset, and in Task 2,
relevance prediction results indicated room for improvement. However, according to confirmation from
the FIRE 2025 CryptoNLP task organizers, our system ultimately achieved the top rank in both Task 1
and Task 2 on the hidden test leaderboard with a reported Macro-F1 score of 1.0. These complementary
results suggest that, despite weaker validation outcomes in certain subsets, the approach generalized
exceptionally well on the unseen test dataset, underscoring the robustness of hierarchical modeling
combined with advanced loss functions.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>This study explored two important NLP problems under the FIRE 2025 Shared Task — hierarchical
opinion classification and Reddit QA relevance prediction. For Task 1, a level-wise modular architecture
was built using DeBERTaV3-small with a hierarchical pipeline of three classifiers. Advanced loss
functions like Focal Loss, Dice Loss, and Supervised Contrastive Learning were used instead of data
augmentation. Intermediate filtered outputs (.pkl) enabled eficient level-based training, and ensemble
evaluation improved generalization across social media platforms. For Task 2, a binary DeBERTa-based
model was trained with class balanced focal loss and threshold tuning, achieving stable macro-F1 scores.
Fold 2 was chosen for reporting due to its consistent precision-recall trade-of. Future improvements
could include scaling up to larger backbone models (DeBERTaV3 base/large) and applying ensemble
inference techniques such as soft or majority voting. Additionally, retrieval-augmented generation
(RAG), task-specific pretraining, or contrastive alignment between question and comment pairs could
enhance the semantic understanding of long form crypto discourse. Addressing data sparsity at
finegrained levels and incorporating prompt based transformers are also promising directions</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <p>We thank the FIRE 2025 CryptOQA organizers and coordinator for their guidance, and acknowledge Jim
Ureel (Abilene Christian University) for materials that supported improvements in manuscript clarity.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly for grammar correction. Figures
were designed by the authors using Canva. All outputs were reviewed and finalized by the authors
to ensure accuracy and originality. No generative AI tools were used for producing the core text or
experimental results.
[9] P. Zhang, X. He, J. Gao, W. Chen, Deberta: Decoding-enhanced BERT with disentangled attention,
arXiv preprint arXiv:2006.03654 (2021). [Online]. Available: https://arxiv.org/abs/2006.03654.
[10] X. Li, X. Sun, Y. Meng, J. Liang, F. Wu, J. Li, Dice loss for data-imbalanced nlp tasks, in: Proc.
58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020, pp. 465–476.
[Online]. Available: https://aclanthology.org/2020.acl-main.45.
[11] Y. Gao, S. Si, H. Luo, H. Sun, Y. Zhang, Revisiting label smoothing in transformer-based text
sentiment classification, Expert Systems with Applications 220 (2023) 119482. [Online]. Available:
https://arxiv.org/abs/2312.06522.
[12] J. Mukhoti, V. Kulharia, A. Sanyal, S. Golodetz, P. H. S. Torr, P. K. Dokania, Calibrating deep neural
networks using focal loss, arXiv preprint arXiv:2002.09437 (2020). URL: https://arxiv.org/abs/2002.
09437. doi:10.48550/arXiv.2002.09437, accepted at NeurIPS 2020.
[13] P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, D. Krishnan,
Supervised contrastive learning, in: Advances in Neural Information Processing Systems (NeurIPS),
2020. [Online]. Available: https://arxiv.org/abs/2004.11362.
[14] B. Günel, J. Du, R. Takács, P. Chang, K. Lee, Supervised contrastive learning for pre-trained
language model fine-tuning, in: Proc. Int. Conf. Learning Representations (ICLR), 2021. [Online].</p>
      <p>Available: https://arxiv.org/abs/2011.01403.
[15] C. Rafel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, P. J. Liu, Exploring
the limits of transfer learning with a unified text-to-text transformer, Journal of Machine Learning
Research 21 (2020) 1–67. [Online]. Available: https://arxiv.org/abs/1910.10683.
[16] T. Gao, X. Yao, D. Chen, Simcse: Simple contrastive learning of sentence embeddings, in: Proc.</p>
      <p>Conf. Empirical Methods in Natural Language Processing (EMNLP), 2021. [Online]. Available:
https://arxiv.org/abs/2104.08821.
[17] V. Koltun, I. Yamshchikov, Pump it: Twitter sentiment analysis for cryptocurrency price prediction,
Information 14 (2023). doi:10.3390/info14070401, [Online]. Available: https://www.mdpi.com/
2078-2489/14/7/401.
[18] S. Sarkar, A. Badwal, A. Roy, K. Rudra, K. Ghosh, Cryptopiqa: A new opinion and question
answering dataset on cryptocurrency, in: Proc. 31st Intl. Conf. Computational Linguistics (COLING),
2025. [Online]. Available: https://aclanthology.org/2025.coling-main.736.pdf.
[19] J. Opitz, S. Burst, Macro f1 and macro f1, in: Proc. Int. Conf. Machine Learning and Data Mining
(MLDM), 2019, pp. 35–48. [Online]. Available: https://arxiv.org/abs/1911.03347.
[20] L. McInnes, J. Healy, J. Melville, Umap: Uniform manifold approximation and projection for
dimension reduction, arXiv preprint arXiv:1802.03426 (2018). [Online]. Available: https://arxiv.
org/abs/1802.03426.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , Bert:
          <article-title>Pre-training of deep bidirectional transformers for language understanding</article-title>
          ,
          <source>in: Proc. Conf</source>
          .
          <article-title>North American Chapter of the Association for Computational Linguistics (NAACL</article-title>
          ),
          <year>2019</year>
          . [Online]. Available: https://arxiv.org/abs/
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Shazeer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Parmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Uszkoreit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Gomez</surname>
          </string-name>
          , Ł. Kaiser,
          <string-name>
            <surname>I. Polosukhin</surname>
          </string-name>
          ,
          <article-title>Attention is all you need</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems (NeurIPS)</source>
          ,
          <year>2017</year>
          . [Online]. Available: https://arxiv.org/abs/1706.03762.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mohapatra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <article-title>Crypto-emotion bert: A multi-label transformer-based framework for emotion detection in cryptocurrency tweets</article-title>
          ,
          <source>Procedia Computer Science</source>
          <volume>199</volume>
          (
          <year>2022</year>
          )
          <fpage>872</fpage>
          -
          <lpage>879</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.procs.
          <year>2022</year>
          .
          <volume>01</volume>
          .109, [Online]. Available: https://doi.org/10.1016/j.procs.
          <year>2022</year>
          .
          <volume>01</volume>
          .109.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K. I.</given-names>
            <surname>Roumeliotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. D.</given-names>
            <surname>Tselikas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. K.</given-names>
            <surname>Nasiopoulos</surname>
          </string-name>
          ,
          <article-title>Llms and nlp models in cryptocurrency sentiment analysis: A comparative classification study</article-title>
          ,
          <source>Big Data and Cognitive Computing</source>
          (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .3390/bdcc8060063, [Online]. Available: https://doi.org/10.3390/bdcc8060063.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. M.</given-names>
            <surname>Lucey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Urquhart</surname>
          </string-name>
          ,
          <article-title>From whales to waves: Social media sentiment, volatility, and whales in cryptocurrency markets</article-title>
          ,
          <source>Journal of Financial Markets</source>
          (
          <year>2025</year>
          ). doi:
          <volume>10</volume>
          . 1016/j.bar.
          <year>2025</year>
          .
          <volume>101682</volume>
          , [Online]. Available: https://www.sciencedirect.com/science/article/ pii/S0890838925001325.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Urquhart</surname>
          </string-name>
          ,
          <article-title>Sentiment matters for cryptocurrencies: Evidence from tweets</article-title>
          ,
          <source>Data</source>
          <volume>10</volume>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .3390/data10040050, [Online]. Available: https://www.mdpi. com/2306-5729/10/4/50.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>FIRE</given-names>
            <surname>2025 Shared Task</surname>
          </string-name>
          <string-name>
            <surname>Organizers</surname>
          </string-name>
          ,
          <year>Cryptoqa 2025</year>
          :
          <article-title>Subjectivity and relevance analysis on social media [online]</article-title>
          , https://sites.google.com/view/cryptoqa-2025/task-description,
          <year>2025</year>
          . Accessed:
          <year>July 2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          , W. Chen,
          <article-title>Debertav3: Improving deberta using electra-style pre-training with gradient disentangled embedding sharing</article-title>
          ,
          <source>in: arXiv preprint arXiv:2111.09543</source>
          ,
          <year>2022</year>
          . [Online]. Available: https://arxiv.org/abs/2111.09543.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>