<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Git Gud at Touché: Unified RAG Pipeline for Native Ad Generation and Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sameer Kamani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Muhammad Taqi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Muhammad Ansab Chaudhary</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Muhammad Ahmad Humayun Hanif</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Faisal Alvi</string-name>
          <email>faisal.alvi@sse.habib.edu.pk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Abdul Samad</string-name>
          <email>abdul.samad@sse.habib.edu.pk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dhanani School of Science and Engineering, Habib University</institution>
          ,
          <addr-line>Karachi</addr-line>
          ,
          <country country="PK">Pakistan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>This project investigates the integration and detection of advertisements within LLM-generated responses using Retrieval-Augmented Generation (RAG). We address two key tasks: first, generating contextually relevant advertisements within RAG-retrieved document segments, ensuring coherence and structured output; and second, developing robust models for detecting embedded advertisements to maintain content integrity. Utilizing the Webis Generated Native Ads 2024 dataset, we aim to evaluate the efectiveness of various RAG-based generation strategies and detection methods. Our research explores techniques for balancing ad relevance with informational content, contributing to the development of transparent and ethical AI-driven advertising.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Retrieval-Augmented Generation</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Ad Integration</kwd>
        <kwd>Ad Detection</kwd>
        <kwd>Deep Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Literature Review</title>
      <p>There are several studies that propose frameworks that leverage RAG to integrate ads into LLM outputs
in a contextually relevant manner. For instance, Hajiaghayi et al. (2024) introduce a segment auction
model where external document segments—selected based on relevance and educational value—are
dynamically merged with LLM responses to embed ad content. In this model, multiple modules work
together to optimize ad placement and revenue generation, demonstrating that RAG-based ad insertion
can outperform traditional keyword-based methods by enhancing the contextual alignment of ads.</p>
      <p>Similarly, Feizi et al. (2023) present a highly influential modular architecture specifically designed
for LLM advertising. Their framework divides the problem into four core components, including a
modification module, which seamlessly adapts the original LLM output to incorporate ad text
without disrupting content coherence. This comprehensive framework is particularly notable because it
addresses the technical challenges of dynamically integrating ad content in real-time and ensures that
the inserted ads remain contextually relevant. This work thus ofers critical insights into balancing
persuasive ad integration with the preservation of natural text flow. Zelch et al. (2023) present a pilot
study where generative models ( GPT-3.5, GPT-4, and You Chat e.t.c. ) are prompted to insert ads into
search results across three scenarios: Unrelated Ads, Loosely Related Ads, and Very Related Ads. Across
all these scenarios the models perform poorly as they either lack relevance or they are not subtle and
are rather jarring. However, it serves an important role as a proof of concept demonstrating that with
basic prompt engineering, it is technically feasible to integrate native ads into text SERPs.</p>
      <p>With advertisements increasingly embedded in LLM outputs, detecting them poses unique challenges.
Schmidt et al. (2024) focus on detecting native ads within conversational search responses by utilizing
ifne-tuned sentence transformer models such as MiniLM and MPNet. These models learn to identify
subtle stylistic and contextual cues that diferentiate native ads from organic content. Although
zeroshot detection methods using LLMs like GPT-4 have been explored, they generally underperform
compared to fine-tuned transformers, particularly when ads are seamlessly integrated.</p>
      <p>Kok-Shun and Chan (2025) applied GPT-4o to detect sponsored ads in video transcripts. The model
was prompted to identify ad segments based on context and intent, without fine-tuning. While KeyBERT,
a BERT-based model, was employed to extract contextually relevant keywords from the transcripts,
leveraging BERT embeddings. After extracting keywords with KeyBERT, GPT-4o was used to group
them into broader categories, reducing dimensionality and improving thematic analysis.</p>
      <p>In general, the literature demonstrates growing interest and promising progress in using RAGs and
LLMs for dynamic, context-aware ad integration, while also highlighting the technical and ethical
challenges of detecting seamlessly embedded advertising. These studies collectively lay the groundwork
for more nuanced, adaptive, and efective ad strategies in generative AI systems.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. TIRA Submissions</title>
        <p>Our work was submitted via TIRA; credit to Fröbe, M. et al.(2023) for streamlining the process. We
made multiple submissions, and the table above lists our final work. The only diference between our
submissions is that of the model used, the overall pipeline remained the same.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Dataset</title>
        <p>Our dataset for Task 1 is the Webis Generated Native Ads 2024 dataset with segmented document
excerpts from MS MARCO V2.1. The data is given in the following format:
• Query: Dictionary of ID (Topic ID) and Keyword Query (Text).
• Candidates: List of segments that were retrieved for the Keyword Query.</p>
        <p>– Docid: ID for the segment in MS MARCO v2.1.
– Score: Score calculated by Elasticsearch. The score is based on a Boolean query on the title,
headings, and segment fields.
– Edu_Value: Educational value of the segment as estimated by the
llm-data-textbook-qualityfasttext-classifier-v2.
– Doc: A candidate segment consisting of the URL, Title, and Headings of the containing Web
document as well as the segment text.
• Advertisements: A list where each entry is either None or a dictionary.</p>
        <p>– Item: Name of the brand, service, or product to be advertised.
– Type: Describes the type of the item (e.g., Brand or a specific type of product).</p>
        <p>– Qualities: A descriptive string of item attributes for use in the ad.</p>
        <p>Our data set for Task 2 is a JSONL-version of the Webis Generated Native Ads 2024 and consists of
two main parts. Firstly, the response data is given in the following format:
• Id: ID of the response.
• Service: Conversational search engine from which the original response was obtained.
• Meta_Topic: One of ten categories that the query belongs to.
• Query: Keyword query for which the response was obtained.</p>
        <p>• Response: Full text of the response.</p>
        <p>And for each response in the previous file, it has corresponding label data which contains the following
elements:
• Id: ID of the response.
• Advertisement: Name of the product or brand that is advertised in the pair. It is None for
responses without an ad.
• Label: 1 for responses with an ad and 0 otherwise.
• Span: Character span containing the advertisement. It is None for responses without an ad.
• Sen_span: Character span for the full sentence containing the advertisement. It is None for
responses without an ad.</p>
        <p>The labels help us classify the response as having an advertisement or not, and then further identify
where exactly said advertisement was placed.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Approach For Sub-Task 1</title>
        <p>Our pipeline begins by embedding each user query and all retrieved document segments using the
all-MiniLM-L6-v2 [25] sentence transformer. Retrieval is performed in three stages: first, each
candidate segment is already assigned two primary scores: Elasticsearch relevance and an
educationalvalue estimate. We take those scores and compute initial rankings accordingly. Second, we index the
segment embeddings in FAISS (IndexFlatL2) [27] and retrieve the top init segments by
nearestneighbor distance to the query embedding. Thirdly, we rerank this shortlist with the Cross-Encoder
ms-marco-MiniLM-L-6-v2 [26] to obtain fine-grained relevance scores. We convert all three signals
into 1-based ranks and compute a weighted average rank, selecting the top  segments as context for
generation. The values init = 10 and  = 3 remain fixed across all queries to balance retrieval quality
with computational eficiency.</p>
        <p>For response generation, we used a variety of diferent models, but our primary work was done using
the Qwen 2.5 7B Instruct [12] model via HuggingFace’s text-generation pipeline (temperature 0.7, top_p
0.9, max_new_tokens 256). We first produce an advertisement-free baseline response to the prompt
Query: &lt;user query&gt;
Context: &lt;top k segments&gt;
Please generate a detailed and coherent response.</p>
        <p>Response:</p>
        <p>Then, for each advertised item in the dataset (skipping the None entries), we iteratively generate
up to three variants by inserting a transition sentence that mentions the item and its attributes. Each
variant is scored by a composite naturalness metric (0.0-1.0 scale) evaluating ad placement, contextual
coherence, and subtlety, along with ROUGE-1 overlap against the baseline. The naturalness metric
incorporates position analysis (preferring 30-70% text location), quality term integration, and avoidance
of explicit commercial markers. We accept the highest-scoring variant if it exceeds the thresholds on
both metrics; otherwise, we fall back to the single best candidate.</p>
        <p>Following this, we apply regex-based post-processing to strip HTML/Markdown artifacts, table
fragments, and duplicate or truncated sentences, ensuring that each generated response concludes with
proper punctuation.</p>
        <p>As we had fine-tuned various models for Sub-Task 2, we integrated one of them into this pipeline,
namely RoBERTa-Large as mentioned in Table 1. When an advertisement passes the previously
mentioned thresholds, it would then be evaluated by our Ad Detector. If it was detected, we would then
regenerate that response. We set a limit of 10 trials due to limited resources. A similar check was applied
when generating the advertisement-free response; if it was classified by our detector as having ad-like
elements, it would be regenerated.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Approach For Sub-Task 2</title>
        <p>For the advertisement detection task, we adopted a fine-tuning strategy using several
transformerbased models, starting with the RoBERTa-Base [16] architecture. Each model was equipped with a
binary classification head to predict whether a given LLM-generated response contains an embedded
advertisement.</p>
        <p>We maintained consistent training hyperparameters for all models (see Table 2), namely three epochs,
a batch size of 8, and a learning rate of 2e-5. Tokenization was handled using the appropriate tokenizer
for each model (e.g., RoBERTaTokenizer, DeBERTaTokenizer), and inputs consisted of full-text responses
from the labeled dataset.</p>
        <p>In our initial approach, we used RoBERTa-Base,but then we experimented with several other
architectures to explore diferent representational strengths. These included MPNet-v2 [ 14],RoBERTa-large,
DeBERTa-v3-base and DeBERTa-v3-large [19], as well as the 0x7o-roberta checkpoint [23] tailored for
ad detection tasks. Each model was trained using binary cross-entropy loss, and the evaluation metrics
included precision, recall, F1 score, and accuracy. This setup allowed us to systematically compare
architectures under a unified training and evaluation framework.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results</title>
      <sec id="sec-4-1">
        <title>4.1. Results for Sub-Task 1</title>
        <p>We ran our model on various subsections of the provided data to verify that advertisements are generated
consistently when required. While occasional inconsistencies occur, overall the generator embeds ads
seamlessly into the responses.</p>
        <p>For example, here is an ad-free response and its corresponding ad-injected response:
Response: Inside money refers to the form of money that exists in bank accounts as bank
deposits and is created as a result of loans. This form of money is controlled
almost entirely by private banks and is created through the lending process.
Onlineonly banks tend to offer higher average rates on savings accounts and high-yield
checking compared to brick-and-mortar banks, so it’s no surprise that their money
market accounts offer better APYs as well. Major online-only banks include
ableBanking, Ally Bank, Synchrony Bank, Discover Bank, Axos Bank, Capital One,
Radius Bank, and TIAA Bank. The rates for money market accounts at these banks range
from 0.30% to 0.60% depending on the balance</p>
        <sec id="sec-4-1-1">
          <title>Listing 1: Ad-Free Response</title>
          <p>Response: Inside money refers to the form of money that exists in bank accounts as bank
deposits and is created as a result of loans. This form of money is controlled
almost entirely by private banks and is created through the lending process.
Onlineonly banks tend to offer higher average rates on savings accounts and high-yield
checking compared to brick-and-mortar banks, so it’s no surprise that their money
market accounts offer better APYs as well. For instance, TD Beyond Checking and
Savings Accounts offers tiered interest, relationship benefits, providing an option
that aligns with these needs. Major online-only banks include ableBanking, Ally Bank
, Synchrony Bank, Discover Bank, Axos Bank, Capital One, Radius Bank, and TIAA Bank.</p>
          <p>The rates for money market accounts at these banks range from 0.30% to 0.60%
depending on the balance.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>Listing 2: Ad-Injected Response We then evaluated the 169 generated responses via the provided MiniLM baseline evaluator. The measured performance is summarized in Table 3.</title>
          <p>The evaluation yielded a recall of 0.222 and a precision of 0.857. The low recall shows that many of
the more subtly integrated advertisements successfully evaded detection (by design), while the high
precision confirms that ordinary informational content was rarely misclassified as advertising. Together,
These results indicate that our approach efectively minimizes false positives without compromising
the intended stealth of the ad content.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Results for Sub-Task 2</title>
        <p>To evaluate the efectiveness of our advertisement classifiers, we measured both F1 score and accuracy
across six transformer-based architectures. Table 4 summarizes model-performance and the test-set
performance, ordered from the smallest to the largest model.</p>
        <p>We observed that the smallest model, MPNet-v2 [15], achieved an F1 score of 0.9756 and an accuracy
of 0.9831. The “0x7o-roberta” checkpoint improved to an F1 score of 0.9880 and an accuracy of 0.9915.
Fine-tuning DeBERTa-v3-base [21] resulted in 0.9918 F1 and 0.9942 accuracy, while RoBERTa-Base
[17] reached 0.9920 F1 at 0.9950 accuracy. The larger models provided marginal but consistent gains:
RoBERTa-large [18] achieved 0.9930 F1 and 0.9955 accuracy, and DeBERTa-v3-large [22] led with 0.9950
F1 and 0.9960 accuracy.</p>
        <p>These results indicate that, although even our smaller models perform exceedingly well, the largest
DeBERTa-v3-large variant still ofers the best balance of precision and recall, pushing both metrics
above 99%. In practical deployments, the modest improvements from base to large checkpoints should be
weighed against the increased inference cost; however, for applications demanding maximal detection
quality, DeBERTa-v3-large is our top choice.</p>
        <p>
          Table 5 compares our best model against two lightweight baselines from the research "Detecting
generated native ads in conversational search"[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The confusion matrix in Figure 1a further validates
these results by illustrating the distribution of true positives, true negatives, false positives and false
negatives.
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Error Analysis</title>
      <p>For Sub-Task 1, our model will at times default to an adlike structure, i.e., normal text, but with awkward
phrasing that seems promotional in nature. This could be attributed to the segments having text of that
nature or to the model not always being able to achieve the desired goal.</p>
      <p>Conversely, when asked to weave in promotional elements subtly, the output is generally very
competent, but at times will fail to blend messaging organically and instead resorts to forced insertions
(a) Confusion matrix comparison on test set 1
that disrupt the surrounding text. For instance, it may abruptly drop in a branded phrase or tagline
mid-paragraph, or tack on a sales-style endorsement that jars against the neutral tone. This may be put
down to the structure of our ad generation pipeline.</p>
      <p>Specifically for Sub-Task 2, our model’s false positives, i.e., neutral content misclassified as ads, appear
to be caused by generic brand mentions or informational lists that lack explicit promotional intent. For
example, the mention of ’FORM, a holistic wellness program’ (ID 2550), was flagged as an ad due to
the brand name and phrases such as ’community-driven’, even though it was part of a biographical
description. Similarly, travel site lists (ID 2274) triggered false alarms because they included terms
such as ’best deals’ and ’user-friendly’, which the model associated with ads despite their neutral and
comparative context. These errors highlight the fact that the model relies on surface-level keywords and
structural patterns without any deeper contextual analysis. The model conflates factual descriptions
with covert advertising, despite a lack of persuasive language.</p>
      <p>Correctly classified ads with low confidence, for example: ID 1660, 1116 reflect the model’s struggle
to identify subtle integration of promotional content. For example, the Fox live-streaming response (ID
1660) includes phrases like "unveil a world of entertainment" and app download instructions, which are
softly persuasive but lack direct calls to action (e.g., "sign up now"). Similarly, property management
ads (ID 1116) list services with mild promotional language: "secure and eficient self-guided touring
technology," blending ads into informative content. The model’s uncertainty arises because these ads
avoid overt markers and instead rely on value-driven descriptors that overlap with neutral advice.
These cases are close to the model’s decision boundary, where the absence of strong ad-specific signals
reduces confidence, even when the predictions are correct. Improving detection here requires training
the model to recognize implicit persuasion rather than relying solely on explicit triggers.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this project, we presented a unified framework for both the seamless integration and robust detection
of advertisements in Retrieval-Augmented Generation (RAG) systems. For Sub-Task 1, our three-stage
retrieval pipeline — combining Elasticsearch relevance, sentence-transformer embeddings, and
crossencoder reranking — provided highly pertinent document contexts to the chosen model, enabling the
generation of coherent, natural sounding responses with embedded ads. Evaluation against oficial
metrics confirmed high precision in ad placement while maintaining overall language fluency, although
further improvements in recall of the subtler insertions remain possible.</p>
      <p>For Sub-Task 2, we fine-tuned a suite of transformer-based classifiers, from MPNet-v2 through
DeBERTa-v3-large, achieving a top F1 score of 99.50% and accuracy of 99.60% with the largest model.
These results underscore the strength of transformer fine-tuning for detecting native advertisements,
even when they are carefully blended into organic content.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Future Work</title>
      <p>Looking ahead, integrating reinforcement-learning based ad insertion policies would likely further
enhance our already impressive scores. We could also try to finetune our LLM model so that it knows
the transitions for ads so they’re naturally added while being subtle enough. On the detection side,
given the near-ceiling performance of the model in the existing test set, we plan to enrich our training
data by using our outputs from Sub-Task 1. Specifically, we will generate a diverse set of ad-injected
responses using our RAG-based generator and then label these automatically (or via lightweight human
review) to augment the Sub-Task 2 dataset. This synthetic data will expose the classifier to a wider
variety of subtle advertising patterns, improving its ability to detect novel or adversarial insertions.
This will make the model generalize better to real-world data and diferent variants of ads.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>The authors would like to acknowledge the support provided by the Ofice of Research (OoR) at Habib
University, Karachi, Pakistan, for funding this project through the internal research grant IRG-2235.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors employed ChatGPT and Grammarly AI tools for
grammar checking, paraphrasing, rewording and consistency checking of sentences. After using the
tools, the authors reviewed and edited the content as required and thereby take full responsibility for
the publication’s content.
[8] Borchers, C., Gala, D. S., Gilburt, B., Oravkin, E., Bounsi, W., Asano, Y. M., &amp; Kirk, H. R. (2022).</p>
      <p>Looking for a handsome carpenter! Debiasing GPT-3 job advertisements [Preprint]. arXiv. https:
//doi.org/10.48550/arXiv.2205.11374
[9] Feizi, S., Hajiaghayi, M. T., Rezaei, K., &amp; Shin, S. (2023). Online advertisements with LLMs:
Opportunities and challenges [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2311.07601
[10] Meguellati, E., Han, L., Bernstein, A., Sadiq, S., &amp; Demartini, G. (2024). How good are LLMs in
generating personalized advertisements. In WWW ’24: Companion Proceedings of the ACM Web
Conference 2024 (pp. 826–829). https://doi.org/10.1145/3589335.3651520
[11] Isozaki, I. (2023, November 26). Literature review on RAG (Retrieval Augmented
Generation) for custom domains. Medium. Retrieved from: https://isamu-website.medium.com/
literature-review-on-rag-retrieval-augmented-generation-for-custom-domains-325bcef98be4.
[12] Qwen Team, An, Y., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Li, C., Liu, D., Huang, F., Wei,
H., Lin, H., Yang, J., Tu, J., Zhang, J., Yang, J., Yang, J., Zhou, J., Lin, J., Dang, K., Lu, K., Bao, K.,
Yang, K., Yu, L., Li, M., Xue, M., Zhang, P., Zhu, Q., Men, R., Lin, R., Li, T., Tang, T., Xia, T., Ren,
X., Ren, X., Fan, Y., Su, Y., Zhang, Y., Wan, Y., Liu, Y., Cui, Z., Zhang, Z., Qiu, Z. (2025). Qwen2.5
Technical Report [Preprint]. arXiv. https://arxiv.org/abs/2412.15115
[13] Qwen Team. (2025). Qwen2.5-14B-Instruct. Hugging Face Model Card. https://huggingface.co/</p>
      <p>Qwen/Qwen2.5-14B-Instruct
[14] Song, K., Tan, X., Qin, T., Lu, J., &amp; Liu, T.-Y. (2020). MPNet: Masked and Permuted Pre-training for</p>
      <p>Language Understanding [Preprint]. arXiv. https://arxiv.org/abs/2004.09297
[15] Sentence-Transformers Team. (2023). all-mpnet-base-v2. Hugging Face Model Card. https://
huggingface.co/sentence-transformers/all-mpnet-base-v2
[16] Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., &amp;
Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach [Preprint]. arXiv.
https://arxiv.org/abs/1907.11692
[17] Hugging Face Inc. (2024). RoBERTa model documentation. Hugging Face Transformers Docs. https:
//huggingface.co/docs/transformers/en/model_doc/roberta
[18] Facebook AI. (2024). roberta-large. Hugging Face Model Card. https://huggingface.co/FacebookAI/
roberta-large
[19] He, P., Liu, X., Gao, J., &amp; Chen, W. (2021). DeBERTa: Decoding-enhanced BERT with Disentangled</p>
      <p>Attention [Preprint]. arXiv. https://arxiv.org/abs/2006.03654
[20] Hugging Face Inc. (2024). DeBERTa model documentation. Hugging Face Transformers Docs. https:
//huggingface.co/docs/transformers/model_doc/deberta
[21] Microsoft. (2024). deberta-v3-base. Hugging Face Model Card. https://huggingface.co/microsoft/
deberta-v3-base
[22] Microsoft. (2024). deberta-v3-large. Hugging Face Model Card. https://huggingface.co/microsoft/
deberta-v3-large
[23] 0x7o. (2024). roberta-base-ad-detector. Hugging Face Model Card. https://huggingface.co/0x7o/
roberta-base-ad-detector
[24] Wang, X., Wang, B., Wang, R., &amp; Liu, W. (2020). MiniLM: Deep Self-Attention Distillation for
Task-Agnostic Compression of Pre-Trained Transformers. In Findings of EMNLP (pp. 4688–4696).
https://doi.org/10.18653/v1/2020.findings-emnlp.418
[25] Microsoft. (2023). microsoft/MiniLM-L6-v2. Hugging Face Model Card. https://huggingface.co/
microsoft/MiniLM-L6-v2
[26] cross-encoder. (2023). Cross-Encoder for MS MARCO (ms-marco-MiniLM-L6-v2) [Model card].</p>
      <p>Hugging Face. https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2
[27] Douze, M., Guzhva, A., Deng, C., Johnson, J., Szilvasy, G., Mazaré, P.-E., Lomeli, M., Hosseini, L., &amp;</p>
      <p>Jégou, H. (2024). The Faiss library [Preprint]. arXiv. https://arxiv.org/abs/2401.08281
[28] Fröbe, M. et al. (2023). Continuous Integration for Reproducible Shared Tasks with TIRA.io. In:
Kamps, J., et al. Advances in Information Retrieval. ECIR 2023. Lecture Notes in Computer Science,
vol 13982. Springer, Cham.. https://doi.org/10.1007/978-3-031-28241-6_20</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Kiesel</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Çöltekin, Ç.,
          <string-name>
            <surname>Gohsen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heineking</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heinrich</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fröbe</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aliannejadi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erjavec</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , . . .
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2025</year>
          ).
          <source>Overview of Touché</source>
          <year>2025</year>
          :
          <article-title>Argumentation Systems</article-title>
          . In CLEF 2025:
          <article-title>Conference and Labs of the Evaluation Forum</article-title>
          , Madrid, Spain.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Kok-Shun</surname>
            ,
            <given-names>B. V.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Chan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2025</year>
          ).
          <article-title>Leveraging ChatGPT for sponsored ad detection and keyword extraction in YouTube videos [Work-in-progress paper]</article-title>
          . arXiv. https://doi.org/10.48550/arXiv.2502. 15102
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Middha</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          (
          <year>2021</year>
          ).
          <article-title>Reinforcing pretrained models for generating attractive text advertisements</article-title>
          .
          <source>In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining</source>
          (pp.
          <fpage>3697</fpage>
          -
          <lpage>3707</lpage>
          ). https://doi.org/10.1145/ 3447548.3467105
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2024</year>
          ).
          <article-title>Deep learning-based saliency assessment model for product placement in video advertisements</article-title>
          .
          <source>Journal of Applied Computer Science</source>
          . https://doi.org/10.69987/ JACS.
          <year>2024</year>
          .40503
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Hajiaghayi</surname>
            ,
            <given-names>M. T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lahaie</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rezaei</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Shin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2024</year>
          ).
          <article-title>Ad auctions for LLMs via retrieval augmented generation [Preprint]</article-title>
          . arXiv. https://doi.org/10.48550/arXiv.2406.09459
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zelch</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bevendorf</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2024</year>
          ).
          <article-title>Detecting generated native ads in conversational search [Preprint]</article-title>
          . arXiv. https://doi.org/10.48550/arXiv.2402.04889
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Zelch</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2023</year>
          ).
          <article-title>Commercialized generative AI: A critical study of the feasibility and ethics of generating native advertising using large language models in conversational web search [Preprint]</article-title>
          . arXiv. https://doi.org/10.48550/arXiv.2310.04892
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>