<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>D. Chen);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Llama-3 with 4-bit Quantization and IA3 Tuning for Multi-Author Writing Style Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dongjie Chen</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jijie Li</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Haoliang Qi</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Foshan University</institution>
          ,
          <addr-line>Foshan, Guangdong</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>9</fpage>
      <lpage>0009</lpage>
      <abstract>
        <p>The Multi-Author Writing Style Analysis task aims to detect authorship changes within documents, critical for plagiarism detection and authorship verification. This paper introduces a novel approach combining Llama-3-8B, 4-bit quantization, and IA3 fine-tuning to address this challenge. Our method eficiently adapts large language models to style change detection while minimizing computational costs. Evaluated on the PAN 2025 dataset (Easy/Medium/Hard tasks), our approach achieves F1 scores of 0.461 (Easy), 0.583 (Medium), and 0.484 (Hard), outperforming baselines by +5.0%, +32.5%, and +6.8%, respectively. The results demonstrate IA3's efectiveness in capturing stylistic features, especially under limited topical diversity.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Writing Style Analysis</kwd>
        <kwd>IA3 Tuning</kwd>
        <kwd>4-bit Quantization</kwd>
        <kwd>Llama-3</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Task and Datasets</title>
      <sec id="sec-2-1">
        <title>2.1. Task Overview</title>
        <p>The Multi-Author Writing Style Analysis task in PAN 2025 aims to identify sentence-level authorial
changes within multi-author documents. Specifically, for each pair of consecutive sentences, the task
requires determining whether a writing style change has occurred. The challenge is designed to evaluate
models’ ability to distinguish stylistic variations while controlling for topic shifts, with three dificulty
levels:
• Easy: The sentences of a document cover a variety of topics, allowing approaches to make use of
topic information to detect authorship changes.
• Medium: The topical variety in a document is small (though still present) forcing the approaches
to focus more on style to efectively solve the detection task.</p>
        <p>• Hard: All sentences in a document are on the same topic.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Datasets</title>
        <p>The datasets are derived from user posts on Reddit, combined into documents with controlled authorial
and topic changes. Each dataset is split into training (70%), validation (15%), and test (15%) sets, and
provided in English. Key characteristics include:</p>
        <sec id="sec-2-2-1">
          <title>2.2.1. Data Structure</title>
          <p>For each problem instance (document), two files are provided:
• problem-X.txt: The text document, formatted as sentences.
• truth-problem-X.json: Ground truth in JSON format, containing:
– authors: The number of authors.
– changes: A binary array where each element indicates whether a style change occurs
between consecutive sentences (1 for change, 0 for no change).</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>2.2.2. Data Preprocessing</title>
          <p>
            The input text is preprocessed to remove redundant empty lines and special characters, ensuring
consistency. Adjacent sentences are paired to form input samples for model training, with each pair
labeled as a style change (
            <xref ref-type="bibr" rid="ref1">1</xref>
            ) or no change (0). For sequences exceeding 512 tokens [7], truncation is
applied to fit model input constraints.
          </p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Evaluation Metrics</title>
        <p>Submissions are evaluated using the macro F1-score, which balances precision and recall across all
sentence pairs. The metric is computed independently for each dificulty level (Easy, Medium, Hard) to
assess model performance under varying conditions. A provided script facilitates evaluation based on
the output JSON files, which must follow the format of the ground truth (i.e., a changes array of binary
values for each sentence pair).</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <p>
        Our approach processes sentence pairs through four key stages: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) Input tokenization and embedding,
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) Quantized transformer processing with IA3-adapted attention, (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) Style feature extraction via
modified feedforward networks, and (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) Binary classification. The system first tokenizes sentence pairs
with [SEP] markers, then processes them through Llama-3’s 4-bit quantized layers where IA3 scaling
vectors adapt query/value projections to emphasize stylistic features. Final hidden states are classified
using a linear layer trained with cross-entropy loss.
      </p>
      <sec id="sec-3-1">
        <title>3.1. Task Formulation</title>
        <p>We frame the Multi-Author Writing Style Analysis as a binary classification task. Given a document
 = {1, 2, ..., } segmented into sentences, we construct adjacent sentence pairs (, +1). The
model predicts a binary label  ∈ {0, 1}, where:
•  = 0: Consecutive sentences share the same author
•  = 1: Author change occurs between sentences</p>
        <p>This formulation transforms the style change detection into a sequence classification problem at the
sentence-pair level.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Model Architecture</title>
        <sec id="sec-3-2-1">
          <title>3.2.1. Input Representation</title>
          <p>Our architecture integrates Meta-Llama-3-8B with 4-bit quantization and IA3 tuning. The computation
lfow for a sentence pair ( , +1) is defined as:
Given a sentence pair (, +1), we concatenate them with a separator token and encode using Llama-3’s
tokenizer:</p>
          <p>x = Tokenizer( ‖ [SEP] ‖ +1, max_length = 512, truncation = True)
where ‖ denotes concatenation and [SEP] is the separation token. The tokenized output includes:
x = {input_ids, attention_mask} ∈ R512
where input_ids are token indices and attention_mask indicates non-padding tokens.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. Embedding Layer</title>
          <p>The tokenized input x is mapped to dense vector representations through an embedding layer:</p>
          <p>
            E = EmbeddingLayer(x)
where E ∈ R512× is the embedding matrix,  = 4096 is the hidden dimension size, and 512 is
the maximum sequence length. This transforms discrete tokens into continuous vectors suitable for
transformer processing.
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            )
(
            <xref ref-type="bibr" rid="ref2">2</xref>
            )
(
            <xref ref-type="bibr" rid="ref3">3</xref>
            )
(
            <xref ref-type="bibr" rid="ref4">4</xref>
            )
(
            <xref ref-type="bibr" rid="ref5">5</xref>
            )
(
            <xref ref-type="bibr" rid="ref6">6</xref>
            )
          </p>
        </sec>
        <sec id="sec-3-2-3">
          <title>3.2.3. Quantized Transformer Processing</title>
          <p>The embeddings are processed through 32 transformer layers with 4-bit quantized weights:
For each layer  ∈ [1, 32]:</p>
          <p>H(0) = E</p>
          <p>H() = TransformerLayerquant (︁ H(−1) )︁
where weights are quantized using NF4 with double quantization [8]:</p>
          <p>Wquant = (W), NF4 = BlockwiseQuant(_ = 64)
Here  denotes the quantization function, reducing memory footprint by 68% while preserving model
capacity.</p>
        </sec>
        <sec id="sec-3-2-4">
          <title>3.2.4. IA3 Attention Modification</title>
          <p>At each attention layer, IA3 injects trainable scaling vectors () to adapt query and value projections:
Q = (W ⊙ (1 +  )) H(−1)</p>
          <p>V = (W ⊙ (1 +  )) H(−1)
Attention = softmax
︂( QK )︂
√</p>
          <p>V
z = Wh + b,
 () = Softmax(z)</p>
          <p>W ∈ R2×
where z is the logit vector and  () denotes predicted probabilities for class labels 0 (no change) and 1
(change).</p>
        </sec>
        <sec id="sec-3-2-5">
          <title>3.2.8. Loss Calculation</title>
          <p>Binary cross-entropy loss optimizes model parameters:</p>
          <p>1 ∑︁ [ log  ( = 1) + (1 −  ) log  ( = 0)]
ℒ = −</p>
          <p>=1
 is batch size,  is ground truth label, and  () is predicted probability.
where W and W are the original query and value projection matrices, ⊙ denotes element-wise
multiplication,  ,   ∈ R are task-specific learnable vectors that scale the projections. This
adaptation allows the model to dynamically adjust attention patterns for style analysis while keeping most
parameters frozen.</p>
        </sec>
        <sec id="sec-3-2-6">
          <title>3.2.5. Feedforward Network Adaptation</title>
          <p>The feedforward network is similarly adapted using scaling vectors:</p>
          <p>FFN(x) = (Wdown ⊙ (1 +  )) (W upx)
where Wdown and Wup are the original down-projection and up-projection matrices respectively, 
denotes the activation function (typically GELU),   ∈ R is a learnable scaling vector.This adaptation
allows the feedforward network to specialize for style analysis tasks while maintaining parameter
eficiency through the lightweight   adjustments.</p>
        </sec>
        <sec id="sec-3-2-7">
          <title>3.2.6. Final Hidden State Extraction</title>
          <p>The contextual representation at the final layer’s last token position is extracted:</p>
          <p>h = H(32)[last] ∈ R
This token aggregates information from the entire sequence, capturing pairwise stylistic relationships.</p>
        </sec>
        <sec id="sec-3-2-8">
          <title>3.2.7. Classification Layer</title>
          <p>
            The hidden state is projected to class probabilities:
(
            <xref ref-type="bibr" rid="ref7">7</xref>
            )
(
            <xref ref-type="bibr" rid="ref8">8</xref>
            )
(
            <xref ref-type="bibr" rid="ref9">9</xref>
            )
(
            <xref ref-type="bibr" rid="ref10">10</xref>
            )
(11)
(12)
(13)
(14)
          </p>
        </sec>
        <sec id="sec-3-2-9">
          <title>3.2.9. Implementation Details</title>
          <p>Key implementation specifications:
• Quantization: NF4 format with double quantization (BitsAndBytesConfig)
• IA3 Targets: q_proj, v_proj, down_proj modules
• Sequence Handling: Padding/truncation to 512 tokens
• Optimization: AdamW ( = 3 × 10 −4 , weight decay  = 0.01)</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <sec id="sec-4-1">
        <title>4.1. Experimental Setup</title>
        <sec id="sec-4-1-1">
          <title>4.1.1. Datasets</title>
          <p>We evaluate our method on the PAN 2025 Multi-Author Writing Style Analysis dataset with three
dificulty levels:
• Easy: 4,200 training documents, 900 validation documents
• Medium: 4,200 training documents, 900 validation documents
• Hard: 4,200 training documents, 900 validation documents
Data is preprocessed into paragraph pairs with binary labels (change/no-change). Class distribution
analysis shows significant imbalance, particularly in Easy task (1:10 ratio).</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.1.2. Model Configuration</title>
          <p>• Base Model: Meta-Llama-3-8B
• Quantization: 4-bit NF4 with double quantization
• IA3 Targets: {q_proj, v_proj, down_proj}
• Classification Head: Single linear layer (4096 → 2)</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>4.1.3. Training Parameters</title>
          <p>The model was trained using the hyperparameters listed in Table 1.
4.1.4. Evaluation Metrics
• Primary metric: Weighted F1-score (handles class imbalance)
• Secondary metrics: Accuracy, Precision, Recall
• Validation: Per-epoch evaluation
• Early stopping: Based on validation F1 improvement</p>
        </sec>
        <sec id="sec-4-1-4">
          <title>4.1.5. Implementation Environment</title>
          <p>• Hardware: NVIDIA A800 80GB GPU [9]
• Frameworks: PyTorch 2.0, HuggingFace Transformers, PEFT
• Training Time: ≈ 8 hours per task (3 epochs)</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Results</title>
        <p>We finally submitted the model to the TIRA [ 10]. Table 2 summarizes the performance comparison
between our IA3-tuned model (team hellojie) and the naive baseline that always predicts 0 across diferent
dificulty levels on the test set. The proposed approach achieves significant F1-score improvements in
all tasks, with the most substantial gain (+32.5%) observed in the Medium-dificulty task.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>• Medium task dominance: 32.5% F1 improvement over the naive baseline demonstrates IA3’s
eficacy in capturing subtle stylistic variations when topic diversity is limited
• Consistent gains: Improvements across all dificulty levels validate the robustness of our
quantization and tuning approach compared to the trivial baseline
This study presents an eficient framework for multi-author writing style analysis by integrating
4bit quantization and IA3 tuning with the Llama-3-8B model. Our approach demonstrates three key
advantages:</p>
      <p>1. Improved performance: Significant F1 improvements across all dificulty levels (+5.0% Easy,
+32.5% Medium, +6.8% Hard), particularly excelling in medium-dificulty tasks where topic consistency
demands precise style discrimination.</p>
      <p>2. Computational eficiency: 4-bit quantization reduces memory requirements by 68% while
maintaining competitive accuracy, enabling deployment on resource-constrained systems.</p>
      <p>3. Task-specific adaptation: IA3’s targeted attention modulation (q_proj, v_proj) efectively
captures subtle stylistic variations without full parameter updates.</p>
      <p>The 32.5% F1 gain in medium-dificulty tasks confirms our hypothesis that IA3 tuning optimizes
style representation learning when topic signals are limited. Future work will explore: 1) Dynamic
quantization for harder tasks, 2) Multi-task learning across dificulty levels, and 3) Hybrid approaches
combining syntactic features with our framework.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work is supported by the National Natural Science Foundation of China (No.62276064).</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration On Generative AI</title>
      <p>During the preparation of this work, the author(s) used DeepSeek in order to: Grammar and spelling
check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and
take(s) full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Gipp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Greiner-Petter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Karlgren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shelmanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          , E. Zangerle, Overview of PAN 2025:
          <article-title>Generative AI Authorship Verification, Multi-Author Writing Style Analysis, Multilingual Text Detoxification, and Generative Plagiarism Detection, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction</article-title>
          .
          <source>Proceedings of the Fourteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Zangerle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>Overview of the Multi-Author Writing Style Analysis Task at PAN 2025</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>Working Notes of CLEF 2025 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Ayele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Babakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X. B.</given-names>
            <surname>Casals</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chulvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dementieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elnagar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Freitag</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Korenčić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mayerl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moskovskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Panchenko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rangel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Rizwan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Smirnova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stamatatos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Stakovskii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Taulé</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ustalov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Yimam</surname>
          </string-name>
          , E. Zangerle,
          <article-title>Overview of PAN 2024: Multi-author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification Condensed Lab Overview</article-title>
          , in: L.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Mulhem</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Quénot</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>G. M.</given-names>
          </string-name>
          <string-name>
            <surname>Di Nunzio</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Soulier</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Galuščáková</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>García Seco de Herrera</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction</source>
          , Springer Nature Switzerland, Cham,
          <year>2024</year>
          , pp.
          <fpage>231</fpage>
          -
          <lpage>259</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Wallis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Allen-Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          , Lora:
          <article-title>Low-rank adaptation of large language models</article-title>
          ,
          <year>2021</year>
          . URL: https://arxiv.org/abs/2106.09685. arXiv:
          <volume>2106</volume>
          .
          <fpage>09685</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lavril</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Izacard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Martinet</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Lachaux</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lacroix</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Rozière</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Hambro</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Azhar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rodriguez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Joulin</surname>
          </string-name>
          , E. Grave, G. Lample,
          <article-title>Llama: Open and eficient foundation language models</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2302.13971. arXiv:
          <volume>2302</volume>
          .
          <fpage>13971</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Muqeeth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mohta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bansal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rafel</surname>
          </string-name>
          ,
          <article-title>Few-shot parameter-eficient ifne-tuning is better and cheaper than in-context learning</article-title>
          ,
          <year>2022</year>
          . URL: https://arxiv.org/abs/2205. 05638. arXiv:
          <volume>2205</volume>
          .
          <fpage>05638</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>I.</given-names>
            <surname>Beltagy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohan</surname>
          </string-name>
          ,
          <article-title>Longformer: The long-document transformer</article-title>
          ,
          <year>2020</year>
          . URL: https://arxiv.org/abs/
          <year>2004</year>
          .05150. arXiv:
          <year>2004</year>
          .05150.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Dettmers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pagnoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Holtzman</surname>
          </string-name>
          , L. Zettlemoyer, Qlora: Eficient finetuning of quantized llms,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2305.14314. arXiv:
          <volume>2305</volume>
          .
          <fpage>14314</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>V.</given-names>
            <surname>Korthikanti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Casper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lym</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>McAfee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Andersch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shoeybi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Catanzaro</surname>
          </string-name>
          ,
          <article-title>Reducing activation recomputation in large transformer models</article-title>
          ,
          <year>2022</year>
          . URL: https://arxiv.org/abs/2205.05198. arXiv:
          <volume>2205</volume>
          .
          <fpage>05198</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kolyada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Grahm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elstner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Loebe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <article-title>Continuous Integration for Reproducible Shared Tasks with TIRA.io</article-title>
          , in: J.
          <string-name>
            <surname>Kamps</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maistro</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Joho</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          <string-name>
            <surname>Kruschwitz</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Caputo (Eds.),
          <source>Advances in Information Retrieval. 45th European Conference on IR Research (ECIR</source>
          <year>2023</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2023</year>
          , pp.
          <fpage>236</fpage>
          -
          <lpage>241</lpage>
          . URL: https://link. springer.com/chapter/10.1007/978-3-
          <fpage>031</fpage>
          -28241-6_
          <fpage>20</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -28241-6_
          <fpage>20</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>