<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>D. Chen);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Beyond Binary: 7-Class Sexism Identification via ModernBERT and SCL</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dongjie Chen</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Haoliang Qi</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Foshan University</institution>
          ,
          <addr-line>Foshan, Guangdong</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>This work presents a novel approach for sexism identification in social media (EXIST 2025 Task 1) by reformulating the binary classification problem into a seven-class task. We implement ModernBERT-large - a state-of-the-art bidirectional transformer - with layered learning rate decay for hierarchical feature optimization. The model is enhanced with Supervised Contrastive Learning (SCL) to improve discrimination of nuanced sexism expressions through metric learning. Our architecture incorporates: (1) Task reformulation from binary to fine-grained seven-class prediction, (2) ModernBERT's memory-eficient attention mechanisms for long-context understanding, and (3) Hybrid CE+SCL loss( = 0.9) for robust representation learning. Experiments demonstrate significant performance gains over baseline methods in both hard and soft evaluation settings.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;ModernBERT</kwd>
        <kwd>SCL</kwd>
        <kwd>7-Class</kwd>
        <kwd>Sexism Detection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Social media platforms have become ubiquitous channels for communication, activism, and social
discourse. However, they also facilitate the proliferation of harmful content, including explicit and
implicit forms of sexism—prejudice or discrimination based on gender, predominantly targeting women
and marginalized groups. Sexist content ranges from overt misogyny to subtle linguistic cues, such as
stereotyping, objectification, and victim-blaming, which normalize gender-based violence and inequality.
Automated detection of such content is critical for creating safer online spaces, yet it remains challenging
due to the subjective nature of sexism interpretation, where annotator demographics (e.g., gender, age,
cultural background) significantly influence labeling decisions[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        The EXIST (sEXism Identification in Social neTworks)[
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] shared task at CLEF addresses this
challenge through a hierarchical classification framework. Task 1, the focus of this work, is a binary
classification problem aiming to identify whether a social media post (tweet, meme, or video) contains
sexist expressions or behaviors. Unique to EXIST is its Learning with Disagreement (LeWiDi)[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
paradigm, which embraces annotator subjectivity[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] by providing multiple labels per instance. This
paradigm rejects the notion of a single "gold label," instead training models to learn from diverse
perspectives and disagreements among annotators.
      </p>
      <p>
        To tackle Task 1, we propose a novel approach that diverges from conventional binary classification.
We reformulate the binary task into a seven-class problem, where each class represents a distinct
combination of annotator votes (e.g., 4 "YES" votes + 2 "NO" votes → Class 4). This granular transformation
explicitly models the spectrum of disagreement among annotators, allowing the model to capture
nuanced subjectivity inherent in sexism annotation. Our architecture leverages ModernBERT-large[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
a state-of-the-art transformer optimized for contextual understanding and long-range dependencies.
ModernBERT’s[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] enhanced bidirectional attention mechanism excels at detecting implicit biases and
sarcasm—common in sexist content—where meaning hinges on subtle contextual cues.
      </p>
      <p>
        Further, we integrate Supervised Contrastive Learning (SCL) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]into the training pipeline. By
combining cross-entropy loss with a contrastive objective, we enforce clustering of embeddings from
semantically similar inputs while separating dissimilar ones. This dual-loss framework enhances feature
discrimination, particularly valuable for distinguishing ambiguous cases (e.g., covert sexism vs.
nonsexist criticism). Our method aligns with EXIST’s soft evaluation protocol (Soft-Soft), where systems
predict probability distributions mirroring annotator label distributions, measured via the Information
Contrast Measure (ICM)[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        This work marks the first application of seven-class reformulation, ModernBERT-large[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and SCL[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
fusion to sexism detection under the LeWiDi paradigm. Our approach addresses the core challenge of
subjectivity while advancing robustness in identifying sexist content across social media.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Task and Datasets</title>
      <sec id="sec-2-1">
        <title>2.1. Task Overview</title>
        <p>Task 1 of the EXIST 2025 challenge focuses on sexism identification in social media posts, formulated
as a binary classification problem. The primary objective is to determine whether a given social media
post (tweet, meme, or video) contains sexist expressions or behaviors. This task addresses the critical
need for automated detection of gender-based discrimination in online spaces, which ranges from overt
misogyny to subtle linguistic cues such as stereotyping, objectification, and victim-blaming.</p>
        <p>A distinctive feature of EXIST is its Learning with Disagreement (LeWiDi) learning paradigm.
Recognizing the subjective nature of sexism interpretation, each instance is annotated by multiple annotators
with diverse socio-demographic backgrounds (gender, age, ethnicity, etc.). This approach intentionally
captures annotator subjectivity, rejecting the notion of a single "gold label" and instead training models
to learn from diverse perspectives and disagreements.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Datasets</title>
        <p>The dataset comprises 10,034 annotated social media posts in English and Spanish, curated from
mainstream platforms. Table 1 details the distribution across training, development, and test sets. Key
characteristics include:
• Multilingual content: Balanced representation of English (48.5%) and Spanish (51.5%)
• Rich annotation metadata: Each post includes annotator demographics (gender, age, ethnicity)
and per-annotator labels
• Disagreement modeling: 6 independent annotations per instance, explicitly capturing labeling
subjectivity</p>
        <p>Data collection followed strict ethical guidelines, with annotations performed by diverse annotator
pools recruited through Prolific. The inter-annotator disagreement rate averages 32.7%, reflecting the
inherent subjectivity in sexism identification.</p>
        <sec id="sec-2-2-1">
          <title>2.2.1. Data Structure</title>
          <p>The dataset is provided in JSON format, with each instance (tweet) represented as a JSON object
containing the following attributes:
• "id_EXIST": Unique identifier for the tweet.
• "lang": Language of the tweet text ("en" for English, "es" for Spanish).
• "tweet": Text content of the tweet.
• "number_annotators": Number of annotators who labeled the tweet.
• "annotators": List of unique identifiers for each annotator.
• "gender_annotators": List of genders of the annotators ("F" for female, "M" for male).
• "age_annotators": List of age groups of the annotators ("18-22", "23-45", "46+").
• "ethnicity_annotators": List of self-reported ethnicities (e.g., "Black or African American",
"Hispano or Latino").
• "study_level_annotators": List of educational levels (e.g., "High school degree or
equivalent", "Bachelor’s degree").
• "country_annotators": List of countries where annotators reside.
• "labels_task1": List of labels (one per annotator) indicating sexist content ("YES" or "NO").
• "labels_task2": List of labels (one per annotator) for source intention ("DIRECT", "REPORTED",
"JUDGEMENTAL", "-", "UNKNOWN").
• "labels_task3": List of arrays (one per annotator) indicating sexism types (e.g., "IDEOLOGICAL-INEQUALITY
"STEREOTYPING-DOMINANCE").</p>
          <p>• "split": Subset ("TRAIN", "DEV", "TEST" + language sufix).</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>2.2.2. Dificulty Levels</title>
          <p>Instances are categorized into three dificulty levels based on annotator agreement:
• Easy: Full consensus (6 identical annotations).
• Medium: Partial consensus (4–5 identical annotations).</p>
          <p>• Hard: High disagreement (≤ 3 identical annotations).</p>
          <p>This stratification reflects the inherent subjectivity in sexism identification, where ambiguous cases
(e.g., sarcasm, implicit stereotypes) yield lower agreement.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Data Preprocessing</title>
        <p>To explicitly model annotator disagreement, we reformulate the binary task into a 7-class problem,
where each class corresponds to a unique combination of annotator votes (e.g., 4 “YES” + 2 “NO” →
Class 4). The preprocessing pipeline includes:
• Vote Aggregation: Counting “YES” votes (0–6) per instance.
• Class Assignment: Mapping vote counts to discrete classes (0–6).
• Soft Label Conversion: For evaluation, class values are converted to probabilistic “YES”/“NO”
scores (e.g., Class 4 → “YES”= 4/6, “NO”= 2/6) to align with the soft evaluation protocol.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Evaluation Metrics</title>
        <p>The evaluation of the proposed models was conducted using the Information Contrast Measure (ICM),
a similarity function that generalizes Pointwise Mutual Information (PMI) and assesses the alignment
between system outputs and ground truth categories in classification tasks. The oficial evaluation
included two modes: Hard-Hard and Soft-Soft. In the Hard-Hard evaluation, the system’s single
predicted label was compared to the majority-voted ground truth label, while the Soft-Soft evaluation
compared the system’s probability distribution to the annotators’ label distribution. Additionally,
Cross-Entropy was used to provide a comprehensive assessment of model performance. The evaluation
metrics are summarized in Table 2.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <sec id="sec-3-1">
        <title>3.1. Task Formulation</title>
        <p>Description
Measures similarity between predicted probability
distribution and ground truth distribution.</p>
        <p>Measures similarity between predicted hard labels and
majority-voted ground truth labels.</p>
        <p>Normalized ICM-Soft score for comparative analysis.</p>
        <p>Normalized ICM-Hard score for comparative analysis.</p>
        <p>Evaluates the diference between predicted probabilities
and true distributions.</p>
        <p>We reformulate the binary sexism identification task (Task 1) as a seven-class classification problem,
where each class represents a distinct combination of annotator votes (0–6 “YES” votes). Given an input
tweet text , the model predicts a class  ∈ {0, 1, ..., 6} corresponding to the aggregated annotator
votes.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Model Architecture</title>
        <p>Our architecture combines ModernBERT-large with Supervised Contrastive Learning (SCL), as illustrated
in Figure 1.</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. ModernBERT Encoder</title>
          <p>For input text , ModernBERT generates contextual embeddings:</p>
          <p>H = ModernBERT() ∈ R× 
(1)
where  is sequence length and  = 1024. We extract the [CLS] token representation h = H[0]
and apply L2 normalization for downstream tasks:
‖h‖2
The normalized embedding z is used for both classification and contrastive learning.
z =</p>
          <p>h</p>
          <p>LR = LRbase ×  −</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Layered Learning Rate Decay</title>
          <p>
            We apply layer-specific learning rates:
where:
•  = 0.8 (decay rate[
            <xref ref-type="bibr" rid="ref9">9</xref>
            ])
•  = 24 (total layers)
•  is layer index
          </p>
        </sec>
        <sec id="sec-3-2-3">
          <title>3.2.3. Supervised Contrastive Learning</title>
          <p>For a batch of  samples, we compute SCL loss using the normalized embeddings z:

1 ∑︁
ℒ = − 
1</p>
          <p>∑︁ log
=1 | ()| ∈ ()</p>
          <p>
            exp(z · z/ )
∑︀=1 1[̸=] exp(z · z/ )
where:
•  () = set of positives (samples sharing the same class label as )
•  = 0.3 (temperature parameter[
            <xref ref-type="bibr" rid="ref10">10</xref>
            ])
• 1[̸=] is an indicator function equaling 1 when  ̸=
          </p>
        </sec>
        <sec id="sec-3-2-4">
          <title>3.2.4. Hybrid Loss</title>
          <p>
            The total loss combines Cross-Entropy (CE) and SCL:
ℒtotal = (1 −  )ℒ +  ℒ
with  = 0.9 controlling the balance[
            <xref ref-type="bibr" rid="ref11">11</xref>
            ].
          </p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Training Protocol</title>
        <p>• Optimizer: AdamW with  1 = 0.9,  2 = 0.999
• Batch Size: 16 (gradient accumulation for efective 64)
• Learning Rate: 1e-5 with linear warmup (10% steps)
• Epochs: 6 with early stopping</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Evaluation Metrics</title>
        <p>We evaluate using:
1. Hard-Hard: Accuracy vs majority vote
2. Soft-Soft: ICM (Information Contrast Measure):
∈
where  = predicted distribution,  = annotator distribution.
ICM(, ) = ∑︁  () log
 ()
()
(2)
(3)
(4)
(5)
(6)</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Experimental Setup</title>
        <p>All experiments were conducted on a single NVIDIA A800 GPU with 80GB memory. The
hyperparameters were carefully tuned to optimize model performance, following the configurations used in our
baseline implementations.</p>
        <sec id="sec-4-1-1">
          <title>4.1.1. Model Configuration</title>
          <p>
            Our architecture combines ModernBERT-large[
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] with Supervised Contrastive Learning (SCL),
implementing the following key components:
• Base Model: ModernBERT-large[
            <xref ref-type="bibr" rid="ref6">6</xref>
            ] (1024 hidden size, 24 layers)
• SCL Temperature ( ): 0.3
• Loss Weighting ( ): 0.9 (CE:SCL ratio)
• Classification Head : Single linear layer (1024 → 7)
          </p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.1.2. Training Parameters</title>
          <p>The training protocol employed the following hyperparameters, summarized in Table 3:</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Results</title>
        <p>Our experimental results demonstrate the efectiveness of the proposed approach in both Soft-Soft
and Hard-Hard evaluation settings. Table 4 presents the performance comparison under the Soft-Soft
evaluation protocol, while Table 5 shows results for the Hard-Hard setting. The oficial evaluation
metrics include Information Contrast Measure (ICM), normalized ICM (ICM Norm), and Cross-Entropy
for Soft-Soft evaluation, with additional F1-score for the Hard-Hard setting.</p>
        <p>Our system achieved competitive performance in both evaluation settings. Under the Soft-Soft
protocol, fosu-students_2 ranked 9th out of 66 submissions with an ICM-Soft score of 0.6663 (ICM-Soft
Norm: 0.6070), significantly outperforming both majority-class (-2.1991) and minority-class (-3.8158)
baselines. The Cross-Entropy value of 1.5069 indicates reasonable alignment with the annotator
distribution, though there remains room for improvement compared to the gold standard (0.5770).</p>
        <p>In the Hard-Hard evaluation, fosu-students_3 secured 12th position with an ICM-Hard score of 0.5661
(ICM-Hard Norm: 0.7889) and F1-score of 0.7638 for the “YES” class. This represents a substantial
improvement over the non-informative baselines, demonstrating our model’s ability to capture majority
voting patterns while maintaining balanced performance across classes. The normalized ICM-Hard
score of 0.7889 suggests our predictions align well with the consensus labels, achieving approximately
79% of the perfect score.</p>
        <p>Note that fosu-students_2 and fosu-students_3 denote variations optimized for Soft-Soft and
HardHard evaluations respectively, using identical architectures but diferent loss weightings in the hybrid
loss</p>
        <p>The performance gap between Soft-Soft and Hard-Hard results suggests our approach handles
clear-cut cases (Hard-Hard) more efectively than ambiguous instances with annotator disagreement
(Soft-Soft). This observation aligns with the challenge’s Learning with Disagreement paradigm, where
modeling subjective interpretations remains an open research problem. Future work should focus on
improving the probabilistic outputs to better capture annotator subjectivity in the Soft-Soft setting.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>
        This work presents a novel approach for sexism identification in social media by fundamentally
reformulating the binary classification task into a fine-grained seven-class problem that explicitly models
annotator disagreement. Our seven-class framework—where each class represents a distinct
combination of annotator votes (0–6 YES)—proves highly efective in capturing the inherent subjectivity
of sexism annotation, addressing a core limitation of traditional binary models. The integration of
ModernBERT-large[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], with its enhanced bidirectional attention and long-context capabilities, enables
superior detection of implicit biases and contextual nuances prevalent in sexist content. Further
performance gains are achieved through Supervised Contrastive Learning (SCL), which enhances feature
discrimination via a hybrid CE+SCL loss ( = 0.9), particularly improving robustness for ambiguous
cases. Experiments under EXIST 2025’s Learning with Disagreement (LeWiDi) paradigm demonstrate
significant improvements: our system ranked 9th/66 in Soft-Soft evaluation (ICM-Soft: 0.6663) and
12th/157 in Hard-Hard evaluation (F1: 0.7638), substantially outperforming majority/minority
baselines. This validates that jointly modeling annotation subjectivity through seven-class reformulation,
advanced contextual understanding via ModernBERT[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and discriminative feature learning via SCL
ofers a powerful framework for nuanced sexism detection.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is supported by the National Natural Science Foundation of China (No.62276064).
During the preparation of this work, the author(s) used DeepSeek in order to: Grammar and spelling
check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and
take(s) full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.-H.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-C.</given-names>
            <surname>Juan</surname>
          </string-name>
          , W.-L. Tseng, H.
          <string-name>
            <surname>-H. Chen</surname>
            ,
            <given-names>Y.-H.</given-names>
          </string-name>
          <string-name>
            <surname>Tseng</surname>
          </string-name>
          ,
          <article-title>Mining browsing behaviors for objectionable content filtering</article-title>
          ,
          <source>Journal of the Association for Information Science and Technology</source>
          <volume>66</volume>
          (
          <year>2015</year>
          )
          <fpage>930</fpage>
          -
          <lpage>942</lpage>
          . URL: https://asistdl.onlinelibrary. wiley.com/doi/abs/10.1002/asi.23217. doi:https://doi.org/10.1002/asi.23217. arXiv:https://asistdl.onlinelibrary.wiley.com/doi/pdf/10.1002/asi.23217.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carrillo-de Albornoz</surname>
          </string-name>
          , I. Arcos,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Spina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Amigó</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Morante</surname>
          </string-name>
          , Overview of exist 2025:
          <article-title>Learning with disagreement for sexism identification and characterization in tweets, memes, and tiktok videos</article-title>
          , in: J.
          <string-name>
            <surname>Carrillo-de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>García Seco de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction</source>
          , Springer Nature Switzerland, Cham,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. C. de Albornoz</surname>
            , I. Arcos,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Amigó</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Morante</surname>
          </string-name>
          , Overview of exist 2025:
          <article-title>Learning with disagreement for sexism identification and characterization in tweets, memes, and tiktok videos (extended overview)</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>CLEF 2025 Working Notes, CEUR Workshop Proceedings</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T.</given-names>
            <surname>Fornaciari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hovy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Paun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Plank</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Poesio</surname>
          </string-name>
          ,
          <article-title>Learning from disagreement: A survey</article-title>
          ,
          <source>Journal of Artificial Intelligence Research</source>
          <volume>72</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>41</lpage>
          . URL: https://www.jair.org/index.php/ jair/article/view/12752. doi:
          <volume>10</volume>
          .1613/jair.1.12752.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. E.</given-names>
            <surname>Fahlén</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wallberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hansson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Ståhl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Söderberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.-P.</given-names>
            <surname>Åkesson</surname>
          </string-name>
          ,
          <article-title>Socially intelligent interfaces for increased energy awareness in the home</article-title>
          , in: C.
          <string-name>
            <surname>Floerkemeier</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Langheinrich</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Fleisch</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Mattern</surname>
          </string-name>
          , S. E. Sarma (Eds.),
          <source>The Internet of Things</source>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2008</year>
          , pp.
          <fpage>263</fpage>
          -
          <lpage>275</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Warner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chafin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Clavié</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Weller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hallström</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Taghadouini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gallagher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Biswas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ladhak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Aarsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Cooper</surname>
          </string-name>
          , G. Adams,
          <string-name>
            <given-names>J.</given-names>
            <surname>Howard</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Poli</surname>
          </string-name>
          , Smarter, better, faster, longer
          <article-title>: A modern bidirectional encoder for fast, memory eficient, and long context finetuning and inference, 2024</article-title>
          . URL: https://arxiv.org/abs/2412.13663. arXiv:
          <volume>2412</volume>
          .
          <fpage>13663</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Khosla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Teterwak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sarna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Isola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maschinot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Krishnan</surname>
          </string-name>
          , Supervised contrastive learning,
          <year>2021</year>
          . URL: https://arxiv.org/abs/
          <year>2004</year>
          .11362. arXiv:
          <year>2004</year>
          .11362.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Muresan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Villavicencio (Eds.),
          <source>Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Dublin, Ireland,
          <year>2022</year>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .
          <article-title>acl-long.0/.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Howard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ruder</surname>
          </string-name>
          ,
          <article-title>Universal language model fine-tuning for text classification</article-title>
          ,
          <year>2018</year>
          . URL: https://arxiv.org/abs/
          <year>1801</year>
          .06146. arXiv:
          <year>1801</year>
          .06146.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.</given-names>
            <surname>Saunshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Goel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Misra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Arora,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kakade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krishnamurthy</surname>
          </string-name>
          ,
          <article-title>Understanding contrastive learning requires incorporating inductive biases</article-title>
          ,
          <year>2022</year>
          . URL: https: //arxiv.org/abs/2202.14037. arXiv:
          <volume>2202</volume>
          .
          <fpage>14037</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>O.</given-names>
            <surname>Sener</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          <article-title>Koltun, Multi-task learning as multi-objective optimization</article-title>
          ,
          <year>2019</year>
          . URL: https://arxiv. org/abs/
          <year>1810</year>
          .04650. arXiv:
          <year>1810</year>
          .04650.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>