<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of CryptOQA: Opinion Extraction and Question Answering from CryptoCurrency-Related Tweets, Reddit, and Youtube Posts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dhruv Kumar</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Somrupa Sarkar</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Koustav Rudra</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kripabandhu Ghosh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computational and Data Sciences (CDS), IISER Kolkata</institution>
          ,
          <addr-line>Mohanpur, West Bengal</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IIT Kharagpur</institution>
          ,
          <addr-line>Kharagpur</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Cryptocurrency continues to dominate online discourse, with platforms such as Twitter, Reddit, and YouTube serving as major hubs for public opinion, market speculation, and user-driven Q&amp;A. The CryptoQA track at FIRE 2025 aims to develop systems that can automatically analyze and categorize cryptocurrency-related social media posts. This year, the track consisted of two tasks: (a) classifying posts into a three-level hierarchical label space covering categories such as Noise, Objective, Subjective (and its subtypes), and (b) detecting whether an answer is relevant to a given question in question-answer pairs. In total, five teams participated in the track, submitting a wide range of approaches which were evaluated primarily using the macro F1-score across all datasets and task levels. The participating systems demonstrated strong performance in handling challenges such as sentiment ambiguity, noisy and unstructured text, and multi-platform variability, ofering valuable insights into modeling cryptocurrency discourse on social media.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Cryptocurrency</kwd>
        <kwd>Information Retrieval</kwd>
        <kwd>Classification</kwd>
        <kwd>Question Answering</kwd>
        <kwd>Social Media</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Over the past decade, the emergence of new cryptocurrencies has had a profound impact on digital
ifnancial ecosystems. This shift has sparked ongoing conversations across various social media platforms.
These platforms, such as Twitter, Reddit, and YouTube, are key places where users share opinions,
ask questions, and discuss updates related to cryptocurrency technologies and market changes. The
amount, variety, and speed of user-generated content create a significant and challenging issue for
researchers who want to analyze large data streams consisting of text, images, and videos. Previous
studies show that social media discussions about cryptocurrencies often reflect a mix of feelings, ranging
from positive and negative responses to neutral comments, factual statements, ads, and user questions
[
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
      </p>
      <p>
        Classifying these sentiments can ofer useful insights into public opinion, new market trends, and
user behavior. Correctly identifying subjective and objective content has been found to aid
decisionmaking, improve market analysis, and allow better tracking of cryptocurrency-related conversations
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However, sentiment classification in this area remains challenging due to the unstructured nature
of social media text. Posts often feature abbreviations, slang, evolving technical terms, and stylistic
diferences that make language interpretation dificult. Traditional text classification models often fail
to capture these subtleties, especially when the expressions are short, casual, or dependent on context
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        In addition to opinionated content, cryptocurrency discussions also include question-and-answer
exchanges, where users seek guidance or clarification about investments, market conditions, or technical
details. Often, community responses can be incomplete, irrelevant, or misleading. Thus, determining
whether a response efectively addresses the posed question is crucial for ensuring reliable access to
information in cryptocurrency communities [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>To tackle these issues, the aim of the proposed research is focused on creating systems that can
perform two main tasks: (i) detailed classification of cryptocurrency-related posts into a hierarchical set
of labels that diferentiate noise, objective information, and subjective opinions, and (ii) assessing the
relevance of responses in question-and-answer pairs. These tasks aim to support automated and ongoing
monitoring of social media discussions, fostering a more organized understanding of cryptocurrency
narratives across various platforms.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Dataset</title>
      <p>There are three sources of datasets, namely Reddit, Twitter, and YouTube, which contain social media
posts related to cryptocurrency. Now this dataset is again divided for classification and Q&amp;A tasks,
respectively. The classification dataset has three-level annotations:
1. Level 1: In level 1, there are three classes: Noise, Objective, and Subjective, and these three
classes are marked with 0, 1, and 2, respectively.
2. Level 2: In this level, the Subjective class is further divided into three categories: Neutral,</p>
      <p>Negative, and Positive. These are marked with 0, 1, and 2, respectively, in the dataset.
3. Level 3: In the last level, there are four classes: Neutral Sentiments, Questions, Advertisement,
and Miscellaneous, and these are marked with 0, 1, 2, and 3, respectively. This set of classes is
branched from the Neutral category in level 2.</p>
      <p>The hierarchical data distribution in the Twitter, Reddit, and YouTube datasets is shown in Figure 1.
The Q&amp;A task has a total of 31,692 samples across all the data sources (Twitter, Reddit, and YouTube
combined). This dataset is further classified as Relevant or non-relevant, with 25,369 and 6,323 samples
from the training and test sets, respectively. The dataset contains 3,704 datapoints for the Relevant class
and 21,531 datapoints for the Non-relevant class.</p>
      <sec id="sec-2-1">
        <title>2.1. Training data statistics</title>
        <p>The training data provided for this evaluation contain posts from Reddit, Twitter, and YouTube, annotated
across three hierarchical levels and mapped to eight final categories. The distribution for each platform
is reported in Figures 2, 3, and 4.</p>
        <p>For Reddit, the Level 1 labels contain 645 Noise, 503 Objective, and 3,852 Subjective posts. In Level 2,
the Subjective class is divided into 259 Positive, 410 Negative, and 3,183 Neutral posts. In Level 3, the
Neutral posts are further split into 476 Neutral Sentiments, 2390 Questions, 105 Advertisements, and
212 Miscellaneous samples.</p>
        <p>For Twitter, the Level 1 distribution includes 1,338 Noise, 1,702 Objective, and 1,248 Subjective
posts. These Subjective posts are divided in Level 2 into 270 Positive, 78 Negative, and 902 Neutral
posts. The Level 3 breakdown of Neutral posts consists of 178 Neutral Sentiments, 136 Questions, 544
Advertisements, and 46 Miscellaneous entries.</p>
        <p>For YouTube, Level 1 consists of 786 Noise, 32 Objective, and 4,182 Subjective posts. In Level 2,
the Subjective posts include 207 Positive, 1,574 Negative, and 2,401 Neutral samples. Level 3 further
categorizes the Neutral posts into 1,391 Neutral Sentiments, 1,000 Questions, 1 Advertisement, and 9
Miscellaneous posts.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Test data statistics</title>
        <p>The test split contains 500 posts each from Reddit, Twitter, and YouTube, totaling 1,500 posts across
platforms. Each platform follows the same three-level hierarchical label design as the training data,
ensuring consistency for evaluation.</p>
        <p>For Q&amp;A, the dataset includes 6,323 paired entries consisting of a question, its corresponding comment,
and a binary relevance label indicating whether the comment addresses the question.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Task Definition</title>
      <p>Task 1 is to develop a classification model to classify cryptocurrency-related social media posts into
eight classes, namely, Noise, Objective, Positive, Negative, Neutral, Question, Advertisement, and
Miscellaneous.</p>
      <p>Task 2 required participants to identify all answers relevant to a given question on cryptocurrency.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Participants</title>
      <p>
        There are final submissions from five teams representing various academic institutions in the CryptOQA
shared task at FIRE 2025, focusing on classifying social media posts related to cryptocurrency. The
varied strategies and advanced mathematical models employed by the respective teams to deal with the
given task are mentioned below:
1. Team KLU (Koneru Lakshmaiah University) submitted a multi-stage transformer architecture
built on DeBERTa-V3-Small [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], where a shared encoder feeds multiple task-specific heads for
each hierarchical level as well as for relevance prediction. Their design integrates focal loss
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], Dice loss [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], supervised contrastive learning [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and label smoothing [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] within a unified
multi-loss framework, and uses conditional routing between levels. The system also employs
5-fold ensembling and layer-wise learning-rate decay for stable optimization. This approach
achieved the highest performance in the shared task, ranking first with a macro F1 of 1.00 in Task
1 and 1.00 in Task 2.
2. Team NLPFusion (Mangalore University) implemented a multi-task transformer model using
BERT [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and CryptoBERT encoders to jointly learn all three hierarchical levels, supported by
task-specific recurrent modules and shared optimization across levels. Their relevance prediction
system uses transformer encoders to process concatenated question–answer pairs. This joint
modeling strategy resulted in competitive performance, with the team ranking second in both
tasks, with a macro F1 of 0.6253 in Task 1 and 0.7940 in Task 2.
3. Team IISER (Indian Institute of Science Education and Research) adapted the Gemma-1B LLM
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] for 8-class flat classification by replacing the original output projection layer with a custom
classification head and fine-tuning only the final transformer block and normalization layers.
Their submission includes an extensive data-cleaning pipeline covering structural corrections,
noise removal, and text normalization to improve robustness. A weighted cross-entropy loss
function was used to mitigate class imbalance. This eficient fine-tuning strategy produced a
macro F1 of 0.6028, securing the third rank in Task 1 and a macro F1 of 0.7432, securing the fourth
rank in Task 2.
4. Team SVNIT (Sardar Vallabhbhai National Institute of Technology) explored recurrent neural
architectures with Word2Vec [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], GloVe [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], and FastText [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] embeddings for hierarchical
classification, ultimately selecting a GRU-based model [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] enhanced with attention [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] for
Level-wise predictions. For the Q&amp;A task, they employed a Siamese GRU network [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] that
embeds questions and answers and predicts their semantic relevance. Their recurrent approach
performed reliably on both subtasks, yielding a Task 1 macro F1 of 0.5222, securing the fourth
rank, and a Task 2 macro F1 of 0.7575, securing the third rank in this subtask.
5. Team MSEC (Meenakshi Sundararajan Engineering College) used a BERT-base (uncased) model
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] fine-tuned as a single recursive classifier predicting all hierarchical labels within a unified
output layer. Their approach incorporates lowercasing, WordPiece tokenization [19], attention
masks, AdamW optimization [20], and dropout regularization to adapt the pretrained model to
cryptocurrency-related social media text. The system obtained a Task 1 macro F1 of 0.2500, while
no submission was made for Task 2.
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Methodologies</title>
      <p>The submitted solutions across teams participating in the CryptOQA shared task employed a range of
techniques for classifying cryptocurrency-related social media posts and determining the relevance of
question–answer pairs. These methodologies can be broadly categorized into four techniques, namely,
transformer-based models, hierarchical classification, recurrent neural architectures (LSTM/GRU), and
parameter-eficient LLM fine-tuning. To tackle the challenges posed by noisy, multi-platform data and
the multi-level label structure, each team adopted one or more of these methodological strategies.</p>
      <p>Transformer-based Models Transformer architectures remained the dominant choice due to
their ability to capture long-range dependencies and contextual relationships through self-attention.
These models are particularly well-suited for domain-specific and informal language prevalent in
cryptocurrency discourse. Several teams fine-tuned pre-trained transformer encoders, such as DeBERTa
and BERT, for both subtasks.</p>
      <p>• DeBERTa-V3-Small was used by Team KLU, integrating multiple task-specific classification
heads and specialized loss functions to improve discriminability across the hierarchical label
space.
• BERT and CryptoBERT were employed by Team NLPFusion in a multi-task learning framework,
allowing the model to jointly learn representations for all hierarchical levels and Q&amp;A relevance
prediction.
• BERT-base (uncased) was utilized by Team MSEC as the primary encoder for a single unified
hierarchical classifier fine-tuned on cleaned and normalized text from all platforms.</p>
      <p>Hierarchical Classification Several teams incorporated the hierarchical structure of the CryptOQA
labels directly into their modeling pipelines. Hierarchical classification enables predictions to be made
at increasing levels of granularity while reducing confusion between semantically distant classes.
• Hierarchical multi-head design was implemented by Team KLU, in which Level 1 outputs
are conditionally routed to deeper classification heads at Levels 2 and 3, enabling fine-grained
discrimination among sentiment and question categories.
• Multi-task hierarchical modeling was adopted by Team NLPFusion, where shared transformer
encoders simultaneously optimize for Level 1, Level 2, Level 3, and Q&amp;A loss functions, leveraging
inter-level dependencies without requiring separate models for each stage.
• Recursive hierarchical prediction was incorporated in Team MSEC’s BERT-based system,
where all hierarchical outputs are produced within a single unified classifier, simplifying the
inference pipeline.</p>
      <p>Recurrent Neural Models (LSTM/GRU) Recurrent neural networks were also explored due to
their efectiveness in modeling sequential dependencies and sentiment-bearing patterns in short social
media posts. These models remain competitive for noisy and domain-specific text when combined with
appropriate embeddings.</p>
      <p>• GRU-based hierarchical classifiers were adopted by Team SVNIT, enhanced with attention
mechanisms to focus on discriminative tokens across levels.
• Siamese GRU networks were applied by the same team for the Q&amp;A task, embedding questions
and answers separately and computing their semantic alignment for relevance prediction.
• LSTM-based architectures were explored in initial experimentation across some teams, although</p>
      <p>GRU variants generally provided more stable performance in the final submitted runs.</p>
      <p>Parameter-eficient LLM Fine-tuning With the availability of compact open-weight LLMs, teams
also explored parameter-eficient tuning approaches to reduce computational cost while retaining strong
contextual understanding.</p>
      <p>• Gemma-1B was adapted by Team IISER by replacing the model’s output projection with an
eight-class head and updating only the final transformer block along with normalization layers,
significantly reducing training overhead.
• Selective layer unfreezing and lightweight optimization strategies were employed to
stabilize fine-tuning on the small task-specific datasets while mitigating overfitting.</p>
      <p>These approaches were accompanied by extensive data preprocessing steps, including text cleaning,
noise filtering, token normalization, and class-weighted loss functions, to compensate for the limited
parameter updates.</p>
      <p>KLU
NLP Fusion</p>
      <p>IISER
SVNIT
MSEC</p>
      <p>Reddit
1.0000
0.5694
0.6102
0.4744
0.2881</p>
      <p>Avg.
1.0000
0.6254
0.6029
0.5222
0.2500</p>
      <p>Task 2</p>
    </sec>
    <sec id="sec-6">
      <title>6. Results</title>
      <p>The results from the CryptOQA 2025 shared task highlight that transformer-based models demonstrate
clear superiority in categorizing cryptocurrency-related social media posts. Team KLU achieved the best
performance in Task 1, attaining a perfect macro F1 score of 1.00 across Reddit, Twitter, and YouTube
using a multi-stage DeBERTa-V3–based hierarchical architecture enriched with multiple loss functions
and ensembling. Their system substantially outperformed the remaining submissions. Team NLPFusion
followed with a macro F1 of 0.6253, supported by a multi-task transformer framework using BERT and
CryptoBERT encoders, while Team IISER’s parameter-eficient adaptation of Gemma-1B secured a close
macro F1 of 0.6028. Recurrent neural models performed moderately well, with Team SVNIT reporting a
macro F1 score of 0.5222, indicating that GRU-based architectures can still be competitive on noisier
platforms, despite lacking the representational power of transformers. Team MSEC’s unified
BERTbased classifier achieved a macro F1 of 0.2500, indicating dificulty in handling the deeper sentiment
granularity required at Level 2 and Level 3. Overall, transformer-driven approaches remained the most
successful for hierarchical classification.</p>
      <p>For the Question and Answering scenario in Task 2, the findings exhibit clearer variation across
architectures. Team KLU again attained the highest score with a macro F1 of 1.00, demonstrating the
strength of supervised contrastive and multi-loss training strategies for semantic relevance prediction.
Team NLPFusion achieved the next best performance with 0.7940, followed by Team SVNIT’s Siamese
GRU network, which obtained 0.7575, reflecting that recurrent similarity models remain efective for
paired-sentence inference. Team IISER’s lightweight Gemma-1B fine-tuned system reported 0.7432,
showing that selective parameter tuning can still yield strong generalization. Team MSEC did not submit
a ranked run for Task 2. In general, transformer-based systems dominated Task 1 with significantly
higher scores, while both transformers and recurrent Siamese models showed competitive performance
in the Q&amp;A task. All contributions and their final results are listed in Table 1.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>The CryptOQA 2025 shared task aimed to evaluate machine learning and NLP approaches for
hierarchical post classification and Q&amp;A relevance detection in cryptocurrency-related social media content.
Transformer-based models have proven to be most efective, with architectures built on DeBERTa-V3,
BERT, and CryptoBERT delivering the strongest results, exemplified by Team KLU’s multi-stage
transformer system achieving perfect macro F1 scores in both tasks. While recurrent architectures, such as
GRU-based classifiers, performed reliably, they remained less efective than transformers for making
deeper sentiment distinctions. Parameter-eficient LLM tuning, as seen with Gemma-1B, also showed
competitive potential, particularly under computational constraints. Hierarchical modeling benefited
from multi-task transformer strategies, though label imbalance and fine-grained categories continued
to pose challenges. For Q&amp;A relevance, both transformer models and Siamese GRU networks produced
strong outcomes, with transformers ultimately leading.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Grammarly in order to: Grammar and spelling
check, and reword. After using this tool/service, the author(s) reviewed and edited the content as
needed and take(s) full responsibility for the publication’s content.
[19] Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao,
K. Macherey, et al., Google’s Neural Machine Translation System: Bridging the Gap Between
Human and Machine Translation, arXiv:1609.08144 (2016).
[20] I. Loshchilov, F. Hutter, Decoupled Weight Decay Regularization, arXiv:1711.05101 (2017).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Efects of Social Media-Based Peer Opinions on the Prices of Cryptocurrency Options</article-title>
          ,
          <source>Journal of Futures Markets</source>
          <volume>45</volume>
          (
          <year>2025</year>
          )
          <fpage>1512</fpage>
          -
          <lpage>1543</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rouhani</surname>
          </string-name>
          , E. Abedin,
          <article-title>Crypto-Currencies Narrated on Tweets: A Sentiment Analysis Approach</article-title>
          ,
          <source>International Journal of Ethics and Systems</source>
          <volume>36</volume>
          (
          <year>2020</year>
          )
          <fpage>58</fpage>
          -
          <lpage>72</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Padfield</surname>
          </string-name>
          , Advanced Techniques in Profiling Cryptocurrency Influencers: A Review,
          <source>International Journal of Blockchains and Cryptocurrencies</source>
          <volume>5</volume>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          , Sentence-BERT:
          <article-title>Sentence Embeddings Using Siamese BERT-Networks</article-title>
          , arXiv:
          <year>1908</year>
          .
          <volume>10084</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>Enhancing Answer Selection in Community Question Answering with Pre-Trained and Large Language Models</article-title>
          , arXiv:
          <fpage>2311</fpage>
          .17502 (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gao</surname>
          </string-name>
          , W. Chen, DEBERTA:
          <article-title>Decoding-Enhanced BERT with Disentangled Attention</article-title>
          , in: International Conference on Learning Representations,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Girshick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Dollar</surname>
          </string-name>
          ,
          <article-title>Focal Loss for Dense Object Detection</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis &amp; Machine Intelligence</source>
          <volume>42</volume>
          (
          <year>2020</year>
          )
          <fpage>318</fpage>
          -
          <lpage>327</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Milletari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Navab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ahmadi</surname>
          </string-name>
          , V-Net:
          <article-title>Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation</article-title>
          ,
          <source>in: 4th International Conference on 3D Vision (3DV)</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>565</fpage>
          -
          <lpage>571</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Khosla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Teterwak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sarna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Isola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maschinot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Krishnan</surname>
          </string-name>
          , Supervised Contrastive Learning,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>33</volume>
          (
          <year>2020</year>
          )
          <fpage>18661</fpage>
          -
          <lpage>18673</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Szegedy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vanhoucke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Iofe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shlens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wojna</surname>
          </string-name>
          ,
          <article-title>Rethinking the Inception Architecture for Computer Vision</article-title>
          , in
          <source>: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>2818</fpage>
          -
          <lpage>2826</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pre-Training of Deep Bidirectional Transformers for Language Understanding</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G.</given-names>
            <surname>Team</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Mesnard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hardin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dadashi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bhupatiraju</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Pathak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sifre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Riviere</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Kale</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Love</surname>
          </string-name>
          , et al.,
          <source>Gemma: Open Models Based on Gemini Research and Technology, arXiv:2403.08295</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Mikolov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          , G. Corrado,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <article-title>Eficient Estimation of Word Representations in Vector Space</article-title>
          , arXiv:
          <fpage>1301</fpage>
          .3781 (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pennington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          ,
          <article-title>GloVe: Global Vectors for Word Representation</article-title>
          ,
          <source>in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Grave</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , T. Mikolov,
          <article-title>Enriching Word Vectors with Subword Information, Transactions of the Association for Computational Linguistics 5 (</article-title>
          <year>2017</year>
          )
          <fpage>135</fpage>
          -
          <lpage>146</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Van</given-names>
            <surname>Merrienboer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gulcehre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bahdanau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bougares</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schwenk</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation</article-title>
          , arXiv:
          <fpage>1406</fpage>
          .1078 (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bahdanau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Cho</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Bengio,</surname>
          </string-name>
          <article-title>Neural Machine Translation by Jointly Learning to Align and Translate</article-title>
          , arXiv:
          <fpage>1409</fpage>
          .0473 (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Mueller</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Thyagarajan,</surname>
          </string-name>
          <article-title>Siamese Recurrent Architectures for Learning Sentence Similarity</article-title>
          ,
          <source>in: Proceedings of the AAAI Conference on Artificial Intelligence</source>
          , volume
          <volume>30</volume>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>