<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Pirate Passau at Touché: Do We Need to Get Complex? A Comparative Analysis of Traditional and Advanced NLP Approaches for Advertisement Classification</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tarek Al Bouhairi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alaa Alhamzeh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Passau</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This paper presents our contribution to the Touché shared task at CLEF 2025: Advertisement in RetrievalAugmented Generation (RAG) - Sub-task 2. The task focuses on determining whether a system-generated response to a user query contains advertising content. We compare a variety of classification approaches, ranging from a traditional TF-IDF + Random Forest baseline to transformer-based methods such as sentence transformers (MiniLM, MPNet), few-shot classification with large language models (LLaMA 3.1, Qwen 2.5), and a RetrievalAugmented Generation (RAG) pipeline that grounds LLM predictions in semantically similar examples. Each method is evaluated on the oficial task dataset using standard classification metrics: precision, recall, F1-score, and accuracy. Among all tested approaches, the fine-tuned sentence transformer (all-MiniLM-L6-v2) achieved the best performance, recording an F1-score of 0.97 on the test set. Our findings suggest that while prompt-based LLMs and RAG approaches ofer flexibility, fine-tuned transformers remain the most efective for this task under the given conditions.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Advertisement Classification</kwd>
        <kwd>Sentence Transformers</kwd>
        <kwd>Few-Shot Learning</kwd>
        <kwd>Retrieval-Augmented Generation</kwd>
        <kwd>Large Language Models</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Online information systems have content made by users and systems and it may be used to teach or to
promote a product. To make search results useful, support good argument searching and avoid biased
information, we should be able to diferentiate between reliable info and ads. The goal of identifying
advertisements in this task forms the core of Task 2 at CLEF 2025. When talking about advertising,
advertisements are seen as tools that promote a product, service, or event by using persuasive or
marketing language.</p>
      <p>In this work, we examine the diferent techniques used to classify binary data. It is important for us to
investigate how performance changes as we contrast traditional machine learning models, transformers
for working with sentence data, prompt-based methods with large language models (LLMs) and models
that use data retrieval for language generation.</p>
      <p>
        Starting, a baseline using TF-IDF vectorization combined with a random forest [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] classifier was
used. Each text is represented as a vector, with features being terms weighted by their TF-IDF values. It
shows which terms exist and how well they inform the corpus, based on how often they are used in it.
The random forest model uses a group of decision trees to sort out these vectors. The method is fully
interpretable, can run on any computer and provides a very eficient benchmark. Then, we analyze
the performance of fine-tuned sentence transformers for this task. We rely on two pretrained models:
all-MiniLM-L6-v2 and mpnet-base-v2 which both have a classification head attached. They are trained
in their entirety to learn special ways of processing input text for immediate classification.
      </p>
      <p>Then, we test few-shot learning with LLaMA 3.1 by feeding a handful of examples and an instruction
and only relying on learning in context rather than training the model. With this approach, LLMs don’t
need to process huge datasets since they can adapt to new tasks using only format examples.</p>
      <p>Finally, we implement a Retrieval-Augmented Generation (RAG) approach. In this setup, we use
sentence embeddings to retrieve semantically similar examples from a labeled corpus via FAISS, rerank
them using a cross-encoder, and provide the most relevant examples to an LLM for classification. This
method allows the model to look at diferent examples which are the most similar to the query.</p>
      <p>Our contributions in this paper are threefold:
1. We apply and compare a diverse set of methods, ranging from simple classifiers to prompt-based</p>
      <p>LLMs, for the binary classification of advertisements.
2. We develop a modular RAG pipeline that combines semantic retrieval and LLM reasoning to
dynamically ground predictions in relevant training data.
3. We provide a detailed evaluation of all approaches using standard classification metrics such as
precision, recall, F1-Score, accuracy and confusion matrix.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Traditional approaches such as bag of words and TF-IDF have served as a baseline for diferent Natural
Language Processing (NLP) tasks. Although, these techniques change raw text into simple, low-density
vector forms with term frequency, so it becomes easy to use machine learning to categorize them [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Nevertheless, they lack understanding the deep meaning of words and sentences, so they might face
dificulties for tasks such as advertisement detection.
      </p>
      <p>To overcome these limitations, sentence transformers have emerged as powerful alternatives. These
models, pre-trained on large-scale natural language data, encode text into dense vector embeddings that
better reflect semantic similarity. While using sentence embeddings as static features already provides
substantial improvements over lexical models, fine-tuning the transformer on the task-specific data
further enhances performance by tailoring the representation space to the classification objective.</p>
      <p>To overcome these challenges, sentence transformers can be used as a powerful solution. these
models, pre-trained on a vast amount of data, are encoded into dense vector representations that better
reflect semantic similarity. Even without fine-tuning the model, it also can be used as an improvement
over lexical models. With fine-tuning the models, the internal weights of the model will start mimicking
the specific task and give better results.</p>
      <p>To build on this, exploring the capabilities of large language models (LLMs) with few-shot learning
was an option. As a few examples to help the model understand what is an advertisement and what is
not. A range of LLMs and sentence transformers were selected for these experiments, varying in size
and architecture, as summarized in Table 1.</p>
      <p>Model
sentence-transformers/all-MiniLM-L6-v2
sentence-transformers/all-mpnet-base-v2
Llama3.1 (Meta)
Qwen 2.5 (Google)</p>
      <p>Size
22.7M
109M
70B
72B</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methods</title>
      <p>In this section, we examine several methods, ranging from basic machine learning approaches to
advanced models based on language understanding, for categorizing binary advertisements. Every
method is examined alone to determine its advantages, weaknesses and usefulness in solving the task
in Touché Task 2. All experiments are kept as consistent as possible by using the same training and test
subsets for every method. With the help of a high-performance computer system and an NVIDIA A100
80GB PCIe GPU, all experiments were optimized, making training, inference and embedding operations
easy for transformer-based and retrieval-augmented models.</p>
      <sec id="sec-3-1">
        <title>3.1. TF-IDF + Random Forest</title>
        <p>
          For a baseline, we implemented a pipeline which has TF-IDF vectorization as a feature extractor
and random forest for classification. This approach could be a baseline for comparison with modern
transformer-based and LLM-based methods [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>The response text from the training data will be transformed to numerical values using the Term
Frequency-Inverse Document Frequency (TF-IDF) method. TF-IDF is a statistical method which is
used to evaluate how similar words are to each other in a large collection of text. We get this score by
multiplying the term’s frequency in the document by its inverse document frequency to make common
words less important.</p>
        <p>This results in a high-dimensional sparse feature vector where each dimension corresponds to a word
or token, and the value encodes how relevant that word is to the document.</p>
        <p>To enhance generalization and minimize noise, we eliminated common English words and excluded
terms that appeared in over 95% of the documents.</p>
        <p>These features were then passed to a random forest classifier. This classifier builds a collection of
decision trees, with each tree trained on a bootstrap sample of the data and utilizing a random selection
of features at every split. The ultimate prediction is achieved by majority voting among all trees. In
our configuration, we set up the classifier using 100 decision trees to guarantee adequate diversity
in the ensemble. To tackle the issue of class imbalance, frequently seen in advertisement datasets,
class_weight=’balanced’ setting was used in the random forest classifier. As a result, the model
assigns higher importance to the minority class (typically the advertisements), helping to reduce bias
toward the majority class during training and improving the model’s ability to detect less frequent but
important positive instances. We set the random seed to 42 to guarantee that results remain consistent
across diferent runs.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Sentence Transformer Embeddings</title>
        <p>To understand how well sentence transformers that show the semantic representation in the
advertisement classification task, two approaches have been used: the first treats the sentence transformer as a
feature extractor which generates embeddings that are classified by a diferent machine learning model,
in this case, random forest. The second involves end-to-end-fine-tuning, which makes the model adapt
its internal weights specifically for the task.</p>
        <sec id="sec-3-2-1">
          <title>3.2.1. Sentence Embedding-Based Classification with Random Forest</title>
          <p>The first approach to the embeddings of the transformer in sentences is to use a hybrid architecture that
combines dense semantic embeddings generated from a pre-trained transformer model with a classical
machine learning classifier. The goal is to check how efectively static sentence transformer works
when combined with a standard classifier for deciding if an advertisement is promoting something or
not.</p>
          <p>
            "all-MiniLM-L6-v2" sentence transformer was used to transform each sentence into a 384-dimensional
dense representation. By embedding the sentence, we extract its semantic message, which involves
details such as tone, intention and the way the language is put together [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ]. Significantly, the transformer
is only applied in a frozen mode, which means none of its internal elements are modified during the
feedforward pass. It consists only of an extractor part, providing a repeatable representation for both
training and testing.
          </p>
          <p>An embedding model generates results that are fed into a standard machine learning model for
training. To capture the best results, we use the random forest classifier that unites several decision tree
models. Each tree in the ensemble works out how to order the space for classifying something as an
ad or a diferent type of media. Labels in the dataset are created by people who give each sentence an
advertisement status (label 1) or not (label 0). Whilst training, random forest selects 100 estimators that
help ensure the dataset is statistically significant and sets up fixed weights to correct issues with labels
distribution. Estimators refer to the number of trees in your trained forest. Decreased variation and
better strength in forecasts are achieved by adding more trees, though it means training takes longer.
When the training process ends, the model is serialized and saved to a data file for use at any time.</p>
          <p>The methods keep the same sentence transformer, encoding every new sentence from the test set
by turning it into a 384-dimensional vector. After the data is turned into embeddings, it is sent to the
random forest for prediction, which outputs only a 1 or a 0. The prediction reveals whether the text is
advertising or not. All the decision trees in the ensemble send their results to the model, which then
decides by voting for the most popular outcome.</p>
          <p>Using this approach brings various benefits. By separating the two phases, the embeddings are
eficient to use with several classifiers. Although the transformer is not fine-tuned, the high-quality
semantic data provided by the pre-training model helps this pipeline. The findings of this approach give
us a good and clear point for evaluating other more advanced models such as fine-tuned transformers
and retrieval-aided generation approaches.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>3.2.2. End-to-End Fine-Tuning of Sentence Transformers for Advertisement Detection</title>
          <p>The second approach was using Sentence Transformers as a feature extractor and as a classifier. In
this approach, a pre-trained sentence transformer embedding model was trained on the training and
validation dataset for the binary classification task by training a custom classification head on top
of the transformer model. This would help the model to change its internal weights and parameters
specifically for this specific task of advertisement detection. Each experiment used a diferent Sentence
Transformer model. One was the all-MiniLM-L6-v2 and the other was mpnet-base-v2.</p>
          <p>
            In both experiments, a single-layer feedforward classifier has been added to the sentence transformer.
The classifier comprised a linear layer that got the sentence embedding (768- or 384-dimensional, again
depending on the model) and returned only one scalar value: the unnormalized logit for the positive
class. Specifically, for a sentence embedding e ∈ R, the classifier computes the logit as:
logit = w⊤e + 
where  is 768 for MPNet 1 and 384 for MiniLM 2. This scalar logit is then passed through a sigmoid
activation function to produce a probability ˆ ∈ [
            <xref ref-type="bibr" rid="ref1">0, 1</xref>
            ] for the positive class (label 1, indicating an
advertisement). Binary classification is achieved by thresholding the sigmoid output at 0.5.
          </p>
          <p>The models were made by feeding each text into the appropriate Hugging Face tokenizer (MPNet or
MiniLM). For tokenization, we used both truncation and padding and made the maximum sequence
length be the same as the model’s default. The tokenizer creates tensors called input-ids and
attentionmask, that are then given to the transformer. In batch mode, all samples were lengthened to the longest
sequence in the set and then sent to the GPU thanks to the utility included in Sentence Transformers.</p>
          <p>In each experiment, the models were fine-tuned end to end, which means that the transformer
parameters and the classifier weights were updated during training. Then, we used the binary
crossentropy loss with logits that applies sigmoid activation and computes the advertisement classification
loss in a numerical manner.</p>
          <p>Training was conducted using the AdamW optimizer with a learning rate of 2 × 10− 5. Mini-batch
gradient descent was applied with a batch size of 16. The model was trained for 3 epochs, with two key
metrics monitored after each epoch: the average training loss and the F1-score on the validation set,
focusing on the positive class (label 1, indicating an advertisement). During validation, the model’s
raw outputs (logits) were passed through a sigmoid function to obtain probabilities, which were then
converted into binary predictions using a threshold of 0.5.</p>
          <p>After training, the model’s weights were saved to disk. For inference, the saved model was reloaded
and evaluated on a separate test set that was not used during training or validation. Each test example
1https://huggingface.co/tarekb21/MPnet-finetune
2https://huggingface.co/tarekb21/All-Mini-LM-v2-FineTuned
was processed through the tokenizer, then passed through the transformer and the classifier head. The
output from the classifier was passed through a sigmoid activation function to produce a probability,
which was then converted into a binary label using a threshold. To evaluate performance, standard
classification metrics were used, including precision, recall, F1-score, accuracy and the confusion matrix.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Few-Shot Learning with LLM</title>
        <p>
          To assess how large language models (LLMs) work for advertising, a few-shot method [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] was applied
using the llama3.1 model [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] and Qwen2.5 [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] in a way that simulates chat interactions. Unlike traditional
supervised methods, this approach does not involve training a dedicated classifier. Instead, the model
is guided entirely through prompting, relying on its pre-trained knowledge and in-context reasoning
capabilities to perform classification.
        </p>
        <p>The prompt was carefully designed to introduce the task and provide minimal supervision through a
few labeled examples. It begins with a task description instructing the model to determine whether a
given text is an advertisement, explicitly requesting a response of "1" for advertisements and "0" for
non-advertisements. Next, there are four examples and each one features an extract from text with an
accompanying label if it is an advertisement or not. The purpose of these examples is to show you
what the decision boundary is. The added new information is then connected to the original prompt to
create the whole in-context example set.</p>
        <p>For inference, the query was transmitted via a chat-based API interface that conforms to the OpenAI
standard. The temperature was set to 0.1 to ensure uniform performance, and the maximum output
tokens were limited to 1, as only a single numeric label was expected. The model’s response was taken
directly from the output.</p>
        <p>This few-shot setup allows for assessing how well a large pre-trained model like llama3.1 can
generalize the advertisement classification task based on some in-context examples, without needing
specific fine-tuning</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Retrieval-Augmented Generation (RAG)</title>
        <p>Retrieval Augmented Generation (RAG) has been implemented to enhance LLMs classification
performance by grounding predictions in semantically similar labeled examples. The pipeline follows
the same as in Figure 1 which is document indexing, embedding-based retrieval, re-ranking, and
classification via prompting [7].</p>
        <p>As a first step, merging the training and validation dataset was essential to widen the variety of
examples that will be embedded in order for the language model to have the most similar examples to
the test query. The dataset consists of response, meta topic, and binary label (1 for advertisement and 0
for not advertisement). For each meta topic, examples were grouped and indexed separately by label.
After that, the responses were embedded using the "all-MiniLM-L6-v2" sentence transformer, and then
stored in a FAISS (Facebook AI Similarity Search) [8]. This structure allowed for per-topic, per-label
nearest neighbor searches during retrieval.</p>
        <p>For the retrieval phase, the system will encode the test query that was given to the large language
model and turn it into a vector embedding using the same sentence transformer (all-MiniLM-L6-v2). It
then performs an approximate nearest-neighbor search using FAISS to retrieve the top K which is in
our case 5 per label which will be 10 in total from the index. If the topic does not exist in the index, a
global fallback search across all topics is executed.</p>
        <p>The first set of examples is further improved by applying a cross-encoder model known as
crossencoder/ms-marco-MiniLM-L-6-v2 to calculate the semantic similarity between each pair of the query
and each candidate. As a result, each example is given a relevance score and the candidates are reordered
so that the system can choose the M most informative ones (M=4). To address label bias, we arrange
two ad examples and two non-ad examples in the final context in the same format.</p>
        <p>At each stage of the pipeline, a specific query context was built, the retrieval, reranking and
classification procedures were run, and the final label was gathered. Using standard classification methods,
predictions were checked using standard metrics: precision, recall, F1 score and a confusion matrix.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <p>Approach
TF-IDF + Random Forest
Sentence Transformer (all-mini-lm-v2) +
Random Forest (no fine-tuning)
Sentence Transformer (all-mini-lm-v2)
fine-tuned
Sentence Transformer (MPnet) fine-tuned
Few-shot (4 shots, LLaMA 3.1)
Few-shot (4 shots, Qwen 2.5)
RAG</p>
      <p>Precision
0.88
0.62
0.97</p>
      <p>Table 2 shows all the results of the evaluated models using: precision, recall, F1-Score, and accuracy.
Based on the table, the best performance was achieved by the fine-tuned Sentence Transformer
(allMiniLM-L6-v2), which has a F1-Score of 0.97 and an accuracy of 0.97. This shows the efectiveness of
ifne-tuning a pre-trained model for a specific task.</p>
      <p>An impressive result was achieved with the classical TF-IDF + Random Forest model, with an F1-score
of 0.85 and accuracy of 0.87. According to this, well-prepared traditional pipelines are still very valuable
when resources for computations are not plentiful. Sentence embeddings without fine-tuning (i.e.,
all-MiniLM-L6-v2 + Random Forest) resulted in an F1-score of only 0.61 which emphasizes that sentence
embeddings alone may fail to capture the important diferences needed for this task.</p>
      <p>The few-shot LLM-based classifier underperformed in comparison to the sentence transformer, even
though given some examples of the task. This might be the case when limited number of in-context
examples provided. The RAG approach improved compared to the few-shot LLM-based classifier,
reaching an F1-score of 0.62. Although the RAG approach did not perform as well as the fine-tuned
transformer models, it still shows promise. In the future, using fine-tuned sentence transformer
embeddings for retrieval might help the system find better examples and improve its results.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>This research addressed the task of advertisement classification by a variety of approaches, ranging from
classical machine learning baselines to recent advances in language modeling. At the start, we looked
at a TF-IDF model linked to a random forest classifier, which formed a powerful and understandable
baseline.</p>
      <p>Additionally, we implemented two configurations for a sentence transformer based approach: one
using embeddings as input to a traditional classifier, and another via full fine-tuning with an added
classification head. The fine-tuned sentence transformer model, performed impressively well compared
to this specific task.</p>
      <p>Moreover, a few shot LLM-based classifiers were used for this task, which showed limited performance
compared to other approaches, highlighting the challenges of relying solely on prompting without
task-specific training. Finally, we implemented a retrieval-augmented generation approach that retrieves
relevant training examples by semantic similarity, and uses them to guide LLM predictions. Even though
this method did not perform as well as the fine-tuned models, it could still be useful when there is
not much training data or when the type of content changes often. That’s because it can find helpful
examples on the fly without the need to be retrained.</p>
      <p>All in all, our best outcomes were delivered from models based on sentence transformers, but
promptbased LLMs and RAG-based pipelines still have potential for use in advertisement classification. For
example, future work may incorporate fine-tuned embeddings during the retrieval phase to improve
contextual grounding.</p>
    </sec>
    <sec id="sec-6">
      <title>Generative AI Declaration</title>
      <p>During the preparation of this work, the authors used ChatGPT, and LanguageTool in order to: Grammar
and spelling check, Paraphrase and reword. After using these tools, the authors reviewed and edited
the content as needed and take full responsibility for the publication’s content.
Y. Su, Y. Zhang, Y. Wan, Y. Liu, Z. Cui, Z. Zhang, Z. Qiu, Qwen2.5 technical report, 2025. URL:
https://arxiv.org/abs/2412.15115. arXiv:2412.15115.
[7] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih,
T. Rocktäschel, S. Riedel, D. Kiela, Retrieval-augmented generation for knowledge-intensive NLP
tasks, in: Advances in Neural Information Processing Systems (NeurIPS), volume 33, 2020, pp.
9459–9474.
[8] M. Douze, A. Guzhva, C. Deng, J. Johnson, G. Szilvasy, P.-E. Mazaré, M. Lomeli, L. Hosseini, H. Jégou,
The faiss library, 2025. URL: https://arxiv.org/abs/2401.08281. arXiv:2401.08281.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Louppe</surname>
          </string-name>
          , Understanding random forests:
          <source>From theory to practice</source>
          ,
          <year>2015</year>
          . URL: https://arxiv.org/ abs/1407.7502. arXiv:
          <volume>1407</volume>
          .
          <fpage>7502</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Kowsari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Jafari Meimandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Heidarysafa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mendu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. E.</given-names>
            <surname>Barnes</surname>
          </string-name>
          ,
          <string-name>
            <surname>D. E. Brown,</surname>
          </string-name>
          <article-title>Text classification algorithms: A survey</article-title>
          ,
          <source>Information</source>
          <volume>10</volume>
          (
          <year>2019</year>
          )
          <article-title>150</article-title>
          . doi:
          <volume>10</volume>
          .3390/info10040150.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reimers</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Gurevych</surname>
          </string-name>
          ,
          <article-title>Sentence-bert: Sentence embeddings using siamese bert-networks</article-title>
          ,
          <source>in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on NLP (EMNLP-IJCNLP)</source>
          ,
          <source>Association for Computational Linguistics</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>3982</fpage>
          -
          <lpage>3992</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>T. B. Brown</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Mann</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Ryder</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Subbiah</surname>
            ,
            <given-names>J. D.</given-names>
          </string-name>
          <string-name>
            <surname>Kaplan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Dhariwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Neelakantan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Shyam</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Sastry</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Askell</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Agarwal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Herbert-Voss</surname>
            , G. Krueger,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Henighan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Child</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Ramesh</surname>
            ,
            <given-names>D. M.</given-names>
          </string-name>
          <string-name>
            <surname>Ziegler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Winter</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Hesse</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            , E. Sigler,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Litwin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gray</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Chess</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Berner</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>McCandlish</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Radford</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Amodei</surname>
          </string-name>
          ,
          <article-title>Language models are few-shot learners</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems (NeurIPS)</source>
          , volume
          <volume>33</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>1877</fpage>
          -
          <lpage>1901</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Touvron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lavril</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Izacard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Martinet</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Lachaux</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Lacroix</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Rozière</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Hambro</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Azhar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rodriguez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Joulin</surname>
          </string-name>
          , E. Grave, G. Lample,
          <article-title>LLaMA: Open and eficient foundation language models</article-title>
          ,
          <source>CoRR abs/2302</source>
          .13971 (
          <year>2023</year>
          ). ArXiv:
          <volume>2302</volume>
          .
          <fpage>13971</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Qwen</surname>
            , :,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Hui</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Zheng</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Tu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Dang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Bao</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Xue</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Men</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Xia</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Fan</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>