1. Introduction

D. Chen);

Llama-3 with 4-bit Quantization and IA3 Tuning for Multi-Author Writing Style Analysis

Dongjie Chen

Jijie Li

Haoliang Qi

0 Foshan University , Foshan, Guangdong , China

2025

000 9 0009

The Multi-Author Writing Style Analysis task aims to detect authorship changes within documents, critical for plagiarism detection and authorship verification. This paper introduces a novel approach combining Llama-3-8B, 4-bit quantization, and IA3 fine-tuning to address this challenge. Our method eficiently adapts large language models to style change detection while minimizing computational costs. Evaluated on the PAN 2025 dataset (Easy/Medium/Hard tasks), our approach achieves F1 scores of 0.461 (Easy), 0.583 (Medium), and 0.484 (Hard), outperforming baselines by +5.0%, +32.5%, and +6.8%, respectively. The results demonstrate IA3's efectiveness in capturing stylistic features, especially under limited topical diversity.

eol>Writing Style Analysis IA3 Tuning 4-bit Quantization Llama-3

1. Introduction 2. Task and Datasets 2.1. Task Overview

The Multi-Author Writing Style Analysis task in PAN 2025 aims to identify sentence-level authorial changes within multi-author documents. Specifically, for each pair of consecutive sentences, the task requires determining whether a writing style change has occurred. The challenge is designed to evaluate models’ ability to distinguish stylistic variations while controlling for topic shifts, with three dificulty levels: • Easy: The sentences of a document cover a variety of topics, allowing approaches to make use of topic information to detect authorship changes. • Medium: The topical variety in a document is small (though still present) forcing the approaches to focus more on style to efectively solve the detection task.

• Hard: All sentences in a document are on the same topic.

2.2. Datasets

The datasets are derived from user posts on Reddit, combined into documents with controlled authorial and topic changes. Each dataset is split into training (70%), validation (15%), and test (15%) sets, and provided in English. Key characteristics include:

2.2.1. Data Structure

For each problem instance (document), two files are provided: • problem-X.txt: The text document, formatted as sentences. • truth-problem-X.json: Ground truth in JSON format, containing: – authors: The number of authors. – changes: A binary array where each element indicates whether a style change occurs between consecutive sentences (1 for change, 0 for no change).

2.2.2. Data Preprocessing

The input text is preprocessed to remove redundant empty lines and special characters, ensuring consistency. Adjacent sentences are paired to form input samples for model training, with each pair labeled as a style change ( 1 ) or no change (0). For sequences exceeding 512 tokens [7], truncation is applied to fit model input constraints.

2.3. Evaluation Metrics

Submissions are evaluated using the macro F1-score, which balances precision and recall across all sentence pairs. The metric is computed independently for each dificulty level (Easy, Medium, Hard) to assess model performance under varying conditions. A provided script facilitates evaluation based on the output JSON files, which must follow the format of the ground truth (i.e., a changes array of binary values for each sentence pair).

3. Methods

Our approach processes sentence pairs through four key stages: ( 1 ) Input tokenization and embedding, ( 2 ) Quantized transformer processing with IA3-adapted attention, ( 3 ) Style feature extraction via modified feedforward networks, and ( 4 ) Binary classification. The system first tokenizes sentence pairs with [SEP] markers, then processes them through Llama-3’s 4-bit quantized layers where IA3 scaling vectors adapt query/value projections to emphasize stylistic features. Final hidden states are classified using a linear layer trained with cross-entropy loss.

3.1. Task Formulation

We frame the Multi-Author Writing Style Analysis as a binary classification task. Given a document = {1, 2, ..., } segmented into sentences, we construct adjacent sentence pairs (, +1). The model predicts a binary label ∈ {0, 1}, where: • = 0: Consecutive sentences share the same author • = 1: Author change occurs between sentences

This formulation transforms the style change detection into a sequence classification problem at the sentence-pair level.

3.2. Model Architecture 3.2.1. Input Representation

Our architecture integrates Meta-Llama-3-8B with 4-bit quantization and IA3 tuning. The computation lfow for a sentence pair ( , +1) is defined as: Given a sentence pair (, +1), we concatenate them with a separator token and encode using Llama-3’s tokenizer:

x = Tokenizer( ‖ [SEP] ‖ +1, max_length = 512, truncation = True) where ‖ denotes concatenation and [SEP] is the separation token. The tokenized output includes: x = {input_ids, attention_mask} ∈ R512 where input_ids are token indices and attention_mask indicates non-padding tokens.

3.2.2. Embedding Layer

The tokenized input x is mapped to dense vector representations through an embedding layer:

E = EmbeddingLayer(x) where E ∈ R512× is the embedding matrix, = 4096 is the hidden dimension size, and 512 is the maximum sequence length. This transforms discrete tokens into continuous vectors suitable for transformer processing. ( 1 ) ( 2 ) ( 3 ) ( 4 ) ( 5 ) ( 6 )

3.2.3. Quantized Transformer Processing

The embeddings are processed through 32 transformer layers with 4-bit quantized weights: For each layer ∈ [1, 32]:

H(0) = E

H() = TransformerLayerquant (︁ H(−1) )︁ where weights are quantized using NF4 with double quantization [8]:

Wquant = (W), NF4 = BlockwiseQuant(_ = 64) Here denotes the quantization function, reducing memory footprint by 68% while preserving model capacity.

3.2.4. IA3 Attention Modification

At each attention layer, IA3 injects trainable scaling vectors () to adapt query and value projections: Q = (W ⊙ (1 + )) H(−1)

V = (W ⊙ (1 + )) H(−1) Attention = softmax ︂( QK )︂ √

V z = Wh + b, () = Softmax(z)

W ∈ R2× where z is the logit vector and () denotes predicted probabilities for class labels 0 (no change) and 1 (change).

3.2.8. Loss Calculation

Binary cross-entropy loss optimizes model parameters:

1 ∑︁ [ log ( = 1) + (1 − ) log ( = 0)] ℒ = −

=1 is batch size, is ground truth label, and () is predicted probability. where W and W are the original query and value projection matrices, ⊙ denotes element-wise multiplication, , ∈ R are task-specific learnable vectors that scale the projections. This adaptation allows the model to dynamically adjust attention patterns for style analysis while keeping most parameters frozen.

3.2.5. Feedforward Network Adaptation

The feedforward network is similarly adapted using scaling vectors:

FFN(x) = (Wdown ⊙ (1 + )) (W upx) where Wdown and Wup are the original down-projection and up-projection matrices respectively, denotes the activation function (typically GELU), ∈ R is a learnable scaling vector.This adaptation allows the feedforward network to specialize for style analysis tasks while maintaining parameter eficiency through the lightweight adjustments.

3.2.6. Final Hidden State Extraction

The contextual representation at the final layer’s last token position is extracted:

h = H(32)[last] ∈ R This token aggregates information from the entire sequence, capturing pairwise stylistic relationships.

3.2.7. Classification Layer

The hidden state is projected to class probabilities: ( 7 ) ( 8 ) ( 9 ) ( 10 ) (11) (12) (13) (14)

3.2.9. Implementation Details

Key implementation specifications: • Quantization: NF4 format with double quantization (BitsAndBytesConfig) • IA3 Targets: q_proj, v_proj, down_proj modules • Sequence Handling: Padding/truncation to 512 tokens • Optimization: AdamW ( = 3 × 10 −4 , weight decay = 0.01)

4. Experiments 4.1. Experimental Setup 4.1.1. Datasets

We evaluate our method on the PAN 2025 Multi-Author Writing Style Analysis dataset with three dificulty levels: • Easy: 4,200 training documents, 900 validation documents • Medium: 4,200 training documents, 900 validation documents • Hard: 4,200 training documents, 900 validation documents Data is preprocessed into paragraph pairs with binary labels (change/no-change). Class distribution analysis shows significant imbalance, particularly in Easy task (1:10 ratio).

4.1.2. Model Configuration

• Base Model: Meta-Llama-3-8B • Quantization: 4-bit NF4 with double quantization • IA3 Targets: {q_proj, v_proj, down_proj} • Classification Head: Single linear layer (4096 → 2)

4.1.3. Training Parameters

The model was trained using the hyperparameters listed in Table 1. 4.1.4. Evaluation Metrics • Primary metric: Weighted F1-score (handles class imbalance) • Secondary metrics: Accuracy, Precision, Recall • Validation: Per-epoch evaluation • Early stopping: Based on validation F1 improvement

4.1.5. Implementation Environment

• Hardware: NVIDIA A800 80GB GPU [9] • Frameworks: PyTorch 2.0, HuggingFace Transformers, PEFT • Training Time: ≈ 8 hours per task (3 epochs)

4.2. Results

We finally submitted the model to the TIRA [ 10]. Table 2 summarizes the performance comparison between our IA3-tuned model (team hellojie) and the naive baseline that always predicts 0 across diferent dificulty levels on the test set. The proposed approach achieves significant F1-score improvements in all tasks, with the most substantial gain (+32.5%) observed in the Medium-dificulty task.

5. Conclusion

• Medium task dominance: 32.5% F1 improvement over the naive baseline demonstrates IA3’s eficacy in capturing subtle stylistic variations when topic diversity is limited • Consistent gains: Improvements across all dificulty levels validate the robustness of our quantization and tuning approach compared to the trivial baseline This study presents an eficient framework for multi-author writing style analysis by integrating 4bit quantization and IA3 tuning with the Llama-3-8B model. Our approach demonstrates three key advantages:

1. Improved performance: Significant F1 improvements across all dificulty levels (+5.0% Easy, +32.5% Medium, +6.8% Hard), particularly excelling in medium-dificulty tasks where topic consistency demands precise style discrimination.

2. Computational eficiency: 4-bit quantization reduces memory requirements by 68% while maintaining competitive accuracy, enabling deployment on resource-constrained systems.

3. Task-specific adaptation: IA3’s targeted attention modulation (q_proj, v_proj) efectively captures subtle stylistic variations without full parameter updates.

The 32.5% F1 gain in medium-dificulty tasks confirms our hypothesis that IA3 tuning optimizes style representation learning when topic signals are limited. Future work will explore: 1) Dynamic quantization for harder tasks, 2) Multi-task learning across dificulty levels, and 3) Hybrid approaches combining syntactic features with our framework.

Acknowledgments

This work is supported by the National Natural Science Foundation of China (No.62276064).

Declaration On Generative AI

During the preparation of this work, the author(s) used DeepSeek in order to: Grammar and spelling check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and take(s) full responsibility for the publication’s content.

[1]

Bevendorf ,

Dementieva ,

Fröbe ,

Gipp ,

Greiner-Petter ,

Karlgren ,

Mayerl ,

Nakov ,

Panchenko ,

Potthast ,

Shelmanov ,

Stamatatos ,

Stein ,

Wang ,

Wiegmann , E. Zangerle, Overview of PAN 2025: Generative AI Authorship Verification, Multi-Author Writing Style Analysis, Multilingual Text Detoxification, and Generative Plagiarism Detection, in: Experimental IR Meets Multilinguality, Multimodality, and Interaction . Proceedings of the Fourteenth International Conference of the CLEF Association (CLEF 2025 ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York, 2025 .

[2]

Zangerle ,

Mayerl ,

Potthast ,

Stein , Overview of the Multi-Author Writing Style Analysis Task at PAN 2025 , in: G. Faggioli,

Ferro ,

Rosso , D. Spina (Eds.), Working Notes of CLEF 2025 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR-WS.org , 2025 .

[3]

A. A.

Ayele ,

Babakov ,

Bevendorf ,

X. B.

Casals ,

Chulvi ,

Dementieva ,

Elnagar ,

Freitag ,

Fröbe ,

Korenčić ,

Mayerl ,

Moskovskiy ,

Mukherjee ,

Panchenko ,

Potthast ,

Rangel ,

Rizwan ,

Rosso ,

Schneider ,

Smirnova ,

Stamatatos ,

Stakovskii ,

Stein ,

Taulé ,

Ustalov ,

Wang ,

Wiegmann ,

S. M.

Yimam , E. Zangerle, Overview of PAN 2024: Multi-author Writing Style Analysis, Multilingual Text Detoxification, Oppositional Thinking Analysis, and Generative AI Authorship Verification Condensed Lab Overview , in: L. Goeuriot , P.

Mulhem , G.

Quénot , D.

Schwab , G. M.

Di Nunzio , L.

Soulier , P.

Galuščáková , A.

García Seco de Herrera , G. Faggioli, N. Ferro (Eds.), Experimental IR Meets Multilinguality, Multimodality, and Interaction , Springer Nature Switzerland, Cham, 2024 , pp. 231 - 259 .

[4]

E. J.

Hu ,

Shen ,

Wallis ,

Allen-Zhu ,

Li ,

Wang ,

Chen , Lora: Low-rank adaptation of large language models , 2021 . URL: https://arxiv.org/abs/2106.09685. arXiv: 2106 . 09685 .

[5]

Touvron ,

Lavril ,

Izacard ,

Martinet , M. -

A. Lachaux , T.

Lacroix , B.

Rozière , N.

Goyal , E.

Hambro , F.

Azhar , A.

Rodriguez , A.

Joulin , E. Grave, G. Lample, Llama: Open and eficient foundation language models , 2023 . URL: https://arxiv.org/abs/2302.13971. arXiv: 2302 . 13971 .

[6]

Liu ,

Tam ,

Muqeeth ,

Mohta ,

Huang ,

Bansal ,

Rafel , Few-shot parameter-eficient ifne-tuning is better and cheaper than in-context learning , 2022 . URL: https://arxiv.org/abs/2205. 05638. arXiv: 2205 . 05638 .

[7]

Beltagy ,

M. E.

Peters ,

Cohan , Longformer: The long-document transformer , 2020 . URL: https://arxiv.org/abs/ 2004 .05150. arXiv: 2004 .05150.

[8]

Dettmers ,

Pagnoni ,

Holtzman , L. Zettlemoyer, Qlora: Eficient finetuning of quantized llms, 2023 . URL: https://arxiv.org/abs/2305.14314. arXiv: 2305 . 14314 .

[9]

Korthikanti ,

Casper ,

Lym ,

McAfee ,

Andersch ,

Shoeybi ,

Catanzaro , Reducing activation recomputation in large transformer models , 2022 . URL: https://arxiv.org/abs/2205.05198. arXiv: 2205 . 05198 .

[10]

Fröbe ,

Wiegmann ,

Kolyada ,

Grahm ,

Elstner ,

Loebe ,

Hagen ,

Stein ,

Potthast , Continuous Integration for Reproducible Shared Tasks with TIRA.io , in: J. Kamps , L.

Goeuriot , F.

Crestani , M.

Maistro , H.

Joho , B.

Davis , C.

Gurrin , U.

Kruschwitz , A . Caputo (Eds.), Advances in Information Retrieval. 45th European Conference on IR Research (ECIR 2023 ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York, 2023 , pp. 236 - 241 . URL: https://link. springer.com/chapter/10.1007/978-3- 031 -28241-6_ 20 . doi: 10 .1007/978-3- 031 -28241-6_ 20 .