<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of Touché 2025: Argumentation Systems⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Extended Version</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johannes Kiesel</string-name>
          <email>johannes.kiesel@uni-weimar.de</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Çağrı Çöltekin</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcel Gohsen</string-name>
          <email>marcel.gohsen@uni-weimar.de</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sebastian Heineking</string-name>
          <email>sebastian.heineking@uni-leipzig.de</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maximilian Heinrich</string-name>
          <email>maximilian.heinrich@uni-weimar.de</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maik Fröbe</string-name>
          <email>maik.froebe@uni-jena.de</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tim Hagen</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mohammad Aliannejadi</string-name>
          <email>m.aliannejadi@uva.nl</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sharat Anand</string-name>
          <email>sharat.annd@uni-weimar.de</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tomaž Erjavec</string-name>
          <email>tomaz.erjavec@ijs.si</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthias Hagen</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matyáš Kopp</string-name>
          <email>kopp@ufal.mf.cuni.cz</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nikola Ljubešić</string-name>
          <email>nikola.ljubesic@ijs.si</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katja Meden</string-name>
          <email>katja.meden@ijs.si</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nailia Mirzakhmedova</string-name>
          <email>nailia.mirzakhmedova@uni-weimar.de</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vaidas Morkevičius</string-name>
          <email>vaidas.morkevicius@ktu.lt</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Harrisen Scells</string-name>
          <email>harry.scells@uni-leipzig.de</email>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Moritz Wolter</string-name>
          <email>moritz.wolter09@gmail.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ines Zelch</string-name>
          <email>ines.zelch@uni-jena.de</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Potthast</string-name>
          <email>martin.potthast@uni-kassel.de</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benno Stein</string-name>
          <email>benno.stein@uni-weimar.de</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Charles University</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Jožef Stefan Institute</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Kaunas University of Technology</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Leipzig University</institution>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Tübingen</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <abstract>
        <p>This paper is the extended overview of Touché: the sixth edition of the lab on argumentation systems that was held at CLEF 2025. With the goal to foster the development of support-technologies for decision-making and opinionforming, we organized four shared tasks: (1) Retrieval-Augmented Debating (RAD), in which participants submit generative retrieval systems that argue against their users and evaluate such systems (new task); (2) Ideology and Power Identification in Parliamentary Debates, in which participants identify from a speech the political leaning of the speaker's party and whether it was governing at the time of the speech (2nd edition); (3) Image Retrieval/Generation for Arguments, in which participants find images to convey a written argument (4th edition, joint task with ImageCLEF); and (4) Advertisement in Retrieval-Augmented Generation, in which participants generate responses to queries with ads inserted and detect such inserted ads (new task). In this paper, we describe these tasks, their setup, and participating approaches in detail.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Decision-making and opinion-forming are everyday tasks that involve weighing pro and con arguments
for or against diferent options. With ubiquitous access to all kinds of information on the web, everybody
has the chance to acquire knowledge for these tasks on almost any topic. However, current information
systems are primarily optimized for returning relevant results and do not address deeper analyses of
arguments or multi-modality. To close this gap, the Touché lab series, running since 2020, has several
tasks to advance both argumentation systems and the evaluation thereof. Previous events and tasks,
data, and publications are available at https://touche.webis.de/. The 2025 edition of Touché features the
following shared tasks:
1. Retrieval-Augmented Debating (RAD; new task) features two sub-tasks in argumentative agent
research of (1) generating responses to argue against a simulated debate partner and (2) evaluating
systems of sub-task 1.
2. Ideology and Power Identification in Parliamentary Debates (2nd edition) features three sub-tasks
in debate analysis of detecting the (1) orientation on traditional left–right spectrum, (2) position
of power of the speaker’s party in the governance of the country or the region, and (3) position
of the speaker’s party on the scale of populism vs. pluralism.
3. Image Retrieval/Generation for Arguments (4th edition; joint task with ImageCLEF [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]) is about
ifnding images to help convey an argument.
4. Advertisement in Retrieval-Augmented Generation (new task) features two sub-tasks in
retrievalaugmented generation of (1) generating responses with advertisements inserted and (2) detecting
whether a response contains an advertisement.
      </p>
      <sec id="sec-1-1">
        <title>In total, 12 teams participated in Touché in 2025.</title>
        <p>• Two teams participated in the Retrieval-Augmented Debating task (cf. Section 4) and submitted
19 runs. For debating (sub-task 1), the participants employed the provided Elasticsearch API,
but used language models for query generation, answer selection, and answer generation. For
evaluation (sub-task 2), the participants also focused on prompting language models.
• Four teams participated in the Ideology and Power Identification in Parliamentary Debates task (cf.</p>
        <p>Section 5) and submitted 20 runs. The approaches used traditional machine learning techniques,
ifne-tuning of multilingual pretrained models, and prompting large language models, among
others.
• Three teams participated in the Image Retrieval/Generation for Arguments task (cf. Section 6),
submitting a total of seven runs. The teams employed various approaches, including image
retrieval using methods such as CLIP, as well as image generation using Stable Difusion.
• Four teams participated in the Advertisement in Retrieval-Augmented Generation task (cf.
Section 7) and submitted 17 runs. All teams participated in the classification sub-task and primarily
submitted approaches based on fine-tuned encoder models. The generation sub-task received
submissions from three teams that used models from the Qwen and Mistral families to generate
responses from—in some cases re-ranked—lists of relevant document segments.</p>
        <p>
          The corpora, topics, and judgments created at Touché are freely available to the research community
on the lab’s website.1 A condensed version of this paper is published in the CLEF 2025 proceedings [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>Argumentation systems are diverse and are connected to many fields within and outside of computer
science. The following sections review the related work and background for each Touché task of 2025.</p>
      <sec id="sec-2-1">
        <title>2.1. Retrieval-Augmented Debating</title>
        <p>
          Psychological literature has shown that engaging in conversational argumentation enhances individuals’
argumentation skills, which can also improve their performance in non-conversational contexts, such
as writing argumentative essays [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Apart from the fact that argumentation is an integral part of
everyday communication, improving argumentation skills can have a positive impact on collaboration
and problem-solving abilities [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Following these hypotheses, ArgueTutor [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] is an agent-based
tutoring system that provide constructive criticism on solved argumentative writing tasks. However,
the ArgueTutor system did not engage in conversational argumentation with its users.
        </p>
        <p>
          In contrast, Project Debater [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] presented a fully automatic debate system that was designed to
challenge humans in formal debates. The debate system employed retrieval and argument mining
mechanisms to find counterarguments that challenge the human’s stance. Though similar to the
conversations in our task, the turns in a formal debate are much longer, allowing each participant to
make several points and attack their opponent before their turn ends, with the goal to convince an
audience that they are the better debater. In contrast, turns in our task more closely resemble informal
debates in which participants directly challenge the arguments after they are presented.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Ideology and Power Identification in Parliamentary Debates</title>
        <p>
          The task is about important aspects of the political discourse: ideology and power like in last year [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], but
this year also on detecting populism—an important current issue in politics. Although a simplification,
political orientation on the left-to-right spectrum has been one of the defining properties of political
ideology [
          <xref ref-type="bibr" rid="ref8 ref9">8, 9</xref>
          ]. Power is another factor that shapes the political discourse [
          <xref ref-type="bibr" rid="ref10 ref11 ref12">10, 11, 12</xref>
          ]. Automatic
identification of political orientation from texts has attracted considerable interest [
          <xref ref-type="bibr" rid="ref13 ref14 ref15 ref16 ref17">13, 14, 15, 16, 17</xref>
          ],
including a few recent shared tasks [
          <xref ref-type="bibr" rid="ref18 ref19">18, 19</xref>
          ]. The present task difers from the earlier ones, with respect
to the source material (parliamentary debates, rather than the popular sources of social media or news)
and multilinguality. Despite its central role in critical discourse analysis, to the best of our knowledge,
power in parliamentary debates has not been studied computationally. There has been only a few recent
computational studies providing indications of linguistic diferences between governing and opposition
parties [
          <xref ref-type="bibr" rid="ref20 ref21 ref22 ref23">20, 21, 22, 23</xref>
          ]. The present shared task and associated data is likely to provide a reference for
the future studies investigating power in political discourse. Similarly, although it is a well-studied topic
in political science [
          <xref ref-type="bibr" rid="ref24 ref25 ref26">24, 25, 26</xref>
          ], there are relatively few computational studies of populist discourse, and,
to the best of our knowledge, this is the first shared task on populism detection.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Image Retrieval/Generation for Arguments</title>
        <p>
          Arguments are complex symbolic structures used to exchange reasons and to defend or challenge
positions [
          <xref ref-type="bibr" rid="ref27 ref28">27, 28</xref>
          ]. In a world where digital communication increasingly relies on visual media, visual
arguments are becoming ever more significant [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ]. Images can enhance the acceptability of individual
premises [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ], and they also have the power to evoke strong emotional responses—such as anxiety,
fear, or hope—or even to prescribe specific actions [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ]. One of the core challenges in analyzing visual
arguments is that images often capture only a single moment in time, making it dificult to convey a
complete argumentative structure. While images can be rich in information, they are also inherently
ambiguous [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]. Therefore, some scholars argue that images cannot constitute arguments [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ]—but
others contend that they can [
          <xref ref-type="bibr" rid="ref34">34</xref>
          ]. An additional perspective proposes that image sequences are more
efective for conveying an argument [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ]. However, when combined with text, the inherent ambiguity
of images can be reduced, fostering “thick representations” of issues that highlight the importance and
strength of the argument, thereby enhancing their persuasive power [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ]. Therefore, images can serve
as visual reasons, either reinforcing fact-based claims or questioning established beliefs [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ].
        </p>
        <p>
          Several promising research directions can be further pursued at the intersection of argumentation
and visual communication. One such direction involves analyzing persuasion techniques, particularly
as they appear in visual formats such as memes [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ]. Another focuses on exploring how readily textual
content can be translated into visual form within an image. While initial progress has been made using
metrics such as imaginability [
          <xref ref-type="bibr" rid="ref37">37</xref>
          ] and concreteness [
          <xref ref-type="bibr" rid="ref38">38</xref>
          ] to evaluate the visualizability of text, this
remains an open area of investigation. Another promising direction involves studying argument quality
dimensions—such as acceptability, credibility, emotional appeal, and suficiency [
          <xref ref-type="bibr" rid="ref39">39</xref>
          ]—and how these
can be measured or expressed visually in images.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Advertisement in Retrieval-Augmented Generation</title>
        <p>
          Previous research has shown that users of conversational search engines have high confidence in the
information provided by LLMs, regardless of whether it is correct or not [
          <xref ref-type="bibr" rid="ref40">40</xref>
          ]. More closely related to our
task, another study found that people struggle to identify advertisements in generated responses [
          <xref ref-type="bibr" rid="ref41">41</xref>
          ].
Both findings underline the importance identifying content, such as advertisements, that tries to
influence the opinion of the user.
        </p>
        <p>
          Given their ability to create content at scale, generative models have recently been studied for their
use in advertising [
          <xref ref-type="bibr" rid="ref42 ref43">42, 43</xref>
          ]. This includes the specific use case of trying to hide advertisements in the
output of LLMs [
          <xref ref-type="bibr" rid="ref44 ref45">44, 45</xref>
          ], as well as research on detecting these types of advertisements [
          <xref ref-type="bibr" rid="ref46">46</xref>
          ]. Finally,
other related work comes from the field of marketing research that has explored how to integrate
advertisements covertly within other media long before the arrival of LLMs. The two forms most closely
related to our shared task are native advertising [
          <xref ref-type="bibr" rid="ref47 ref48">47, 48</xref>
          ] and product placement [
          <xref ref-type="bibr" rid="ref49 ref50">49, 50</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Lab Overview and Statistics</title>
      <p>For the sixth edition of the Touché lab, we received 62 registrations from 22 countries (vs. 68 registrations
in 2024). The most lab registrations came from India (19). Out of the 62 registered teams, 12 actively
participated in this year’s Touché edition (2, 4, 2, and 4 teams submitting valid runs for Task 1, 2, 3,
and 4, respectively). Active teams in previous editions were: 20 in 2024, 7 in 2023, 23 in 2022, 27 in 2021,
and 17 in 2020.</p>
      <p>
        We used TIRA [
        <xref ref-type="bibr" rid="ref51">51</xref>
        ] as the submission platform for Touché 2025 through which participants could
either submit code, software, or run files. 2 We tracked the resources of all executions with the alpha
version of the TIREx Tracker [
        <xref ref-type="bibr" rid="ref52">52</xref>
        ] that monitors the GPU/CPU/RAM usage over time and the energy
that an approach consumed (as well as other hardware/software specifications) in the ir_metadata
format [
        <xref ref-type="bibr" rid="ref53">53</xref>
        ]. Code and software submissions increase reproducibility, as the software can later be
executed on diferent data of the same format. For code and software submissions, a team implemented
their approach in a Docker image that they uploaded to their dedicated Docker registry in TIRA. For
code submissions, the TIRA client created a docker image from the code of some git repository. By
ensuring that the repository is clean, i.e., all changes are committed and there are no untracked files, it
is possible to link a docker image to the exact version of a git repository that produced a submission.
Software submissions, however, do not need to be linked to the git repository.
      </p>
      <p>Submissions in TIRA are immutable, and a team could upload as many code or software submissions
as they liked; only they and TIRA had access to their dedicated Docker image registry.3 To improve
reproducibility, TIRA executes submitted software in a sandbox by removing the internet connection.
This requires the software to be fully installed in the Docker image, including all libraries and models, and
thus eases re-running software later. Participants could select the resources available to their software
for execution, with options ranging from 1 CPU core with 10 GB RAM to 5 CPU cores with 50 GB RAM
and 1 Nvidia A100 GPU with 40 GB RAM. Participants could run their software multiple times using
diferent resources to study the scalability and reproducibility (e.g., whether the software executed on
a GPU yields the same results as on a CPU). TIRA used a Kubernetes cluster with 1,620 CPU cores,
25.4 TB RAM, 24 GeForce GTX 1080 GPUs, and 4 A100 GPUs to schedule and execute the software
submissions.</p>
      <sec id="sec-3-1">
        <title>2https://tira.io 3The images were not public while the shared task was ongoing.</title>
        <p>The goal of this task is to create generative retrieval systems that engage in argumentative conversations
by presenting counterarguments to users’ claims. Such systems can be useful as educational tools to
train users’ argumentation skills or to explore the argument space on a topic to form or validate an
opinion. Participants of this task develop debate systems, which should generate persuasive responses
grounded in arguments from a provided argument collection.</p>
        <sec id="sec-3-1-1">
          <title>4.1. Task Definition</title>
          <p>
            Teams can participate in two sub-tasks: (1) developing debate systems, and (2) providing metrics to assess
various quality criteria based on Grice’s axioms of cooperative dialogs [
            <xref ref-type="bibr" rid="ref54">54</xref>
            ], specifically on the quantity
(length), quality (faithfulness), relevance (cf. argumentative quality), and manner (clarity) of system
responses. In sub-task 1, participants submit debate system software with which simulated user interact
in up to vfie turns. The submissions are assessed based on the resulting debates, which simultaneously
serve as evaluation data for sub-task 2. The debates are annotated according to the annotation schema
mentioned above, and submissions to sub-task 2 are assessed based on their correlation strength with
human judgments.
          </p>
        </sec>
        <sec id="sec-3-1-2">
          <title>4.2. Data Description</title>
          <p>
            Participants received an argument collection of about 300 000 arguments extracted from around 1 500
debates from the ClaimRev dataset [
            <xref ref-type="bibr" rid="ref55">55</xref>
            ]. For each of these arguments, the topic was specified, as well
as exactly one claim that is supported and one that is attacked by this argument. While only one
of the supported or attacked claim could be extracted from the ClaimRev dataset, the missing claim
was produced automatically by producing a semantic negation with the help of Llama 3.1 in case the
attacked claim was missing or by using the argument itself as the supported claim. The argument
collection was provided as a pre-computed Elasticsearch index that allows sparse retrieval with BM25
as well as dense retrieval with k-NN based on the argument text or supported and attacked claims. The
embeddings were pre-computed with the document encoder of the pre-trained Stella embedding model
[
            <xref ref-type="bibr" rid="ref56">56</xref>
            ] (checkpoint: dunzhang/stella_en_400M_v5). The data is available online.4
          </p>
          <p>Additionally, participants were provided a training set of 100 claims on various topics extracted from
the Change My View subreddit.5 From this subreddit, almost 2 000 threads were acquired through
Reddit’s API. From this 2 000 threads, an automatic preselection of 500 posts was made based on the
BM25 retrieval score according to keywords extracted from the title of the posts and the number
of relevant arguments from the ClaimRev index. From these 500 posts, 100 were manually selected
to ensure that claims are suficiently backed up by arguments from the argument collection. These
100 posts underwent severe automatic and manual post-processing to remove author’s edits, special
characters, and other noise from the posts. These cleaned titles and contents of the posts were provided
as claims and descriptions, respectively.</p>
          <p>
            For each claim in the dataset, a debate was generated by simulating a discussion between a basic
user and a baseline debate system. Each of the system turns were manually annotated according to an
adaption of Grice’s maxims of cooperation [
            <xref ref-type="bibr" rid="ref54">54</xref>
            ]. For the informal debate context of this shared task, we
reinterpreted these maxims as a binary classification schema in the following way:
• Quantity. Does the response contain at least one (attack or defense) argument, and at most one
of each type of defense and attack?
• Quality. Can the response be deduced from the retrieved arguments?
• Relation. Is the response coherent with the conversation, and does it express a contrary stance
to the user?
          </p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>4https://touche.webis.de/data.html#touche25-retrieval-augmented-debate-claims 5https://www.reddit.com/r/changemyview/</title>
        <p>• Manner. Is the response clear and precise?
The claims, debates, and annotations were released together as a training dataset for sub-task 1 and
sub-task 2.</p>
        <sec id="sec-3-2-1">
          <title>4.3. Participant Approaches</title>
          <p>In 2025, two teams participated in this task and submitted 19 runs. Moreover, we added two baseline
runs for comparison.</p>
          <p>
            Baselines. For sub-task 1, we provide a baseline that responds with the top claim retrieved without
rewriting by (default Elasticsearch) BM25 when the user’s utterance is matched with the attacked claim
of an indexed claim. For sub-task 2, we provide a 1-baseline, i.e., an evaluator that always produces the
maximum score of 1 for each dimension.6
Team SINAI [
            <xref ref-type="bibr" rid="ref57">57</xref>
            ] This team (codename: Lewis Carroll) attempted both sub-task 1 and sub-task 2.
For sub-task 1, the team proposed a five-step approach which combines the reasoning abilities of an
LLaMA3-8B-Instruct model with the provided Elasticsearch API. The LLM first analyses how to answer
the question, then generates queries that are used to search Elasticsearch, then selects the arguments
across these queries, and finally generates the final counter argument. For sub-task 2, the team focused
on three LLM-based prompting methods to derive a measure for evaluating argument quality. Using
the same LLaMA3-8B-Instruct model, the team investigates zero-shot, few-shot, and analysis-based
few-shot approaches.
          </p>
          <p>Team DS@GT [58] This team (codename: Haskell Curry) performed both sub-tasks by zero-shot
prompting a LLM model, testing six diferent models: Anthropic Claude (opus4 and sonnet4), Google
Gemini 2.5 (flash and pro), and OpenAI GPT (4.1 and 4o). The prompt for sub-task 1 uses detailed
guidelines, requesting of the model direct engagement, logical reasoning, being evidence-based, being
respectful and constructive in tone, being clear and precise, being brief, and to use assertive utterances—
each of these with more details. The prompt for sub-task 2 features a specification for each metric.
Scores for all four metrics are requested at once.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>4.4. Task Evaluation</title>
          <p>Submissions for sub-task 1 are evaluated using a new set of 100 initial claims, obtained by following
the methodology of the training set creation. Debates for the assessment are generated in interaction
with various simulated users, each presenting diferent argument strategies, resulting in one simulated
debate for each combination of claim, user, and system. All debates are assessed using the evaluation
systems submitted for sub-task 2 and our baseline metrics. Each participant turn of a random subset
of 20 debates were judged by human experts according to the criteria of sub-task 2 to identify for
each criterion the evaluation system that aligns best with human judgment. Alignment with human
judgment is quantified by Precision, Recall, and F 1 individually for each of the four maxims. The
respective evaluation systems are then used to assess the debate systems from sub-task 1. The final
scores are determined by averaging the percentages of responses that fulfill the maxims for sub-task 1
and the macro-averaged F1 scores of the classifiers across all maxims for sub-task 2.</p>
          <p>Table 1 presents the results and rankings of the participanting systems, with Team DS@GT emerging
as the winner of sub-task 1 with its GPT-4.1-based zero-shot prompting approach. In general, there is a
considerable variance in the performance of the large closed-source models from Team DS@GT with
Claude models performing noticably worse than Google’s Gemini models. While GPT-4.1 achieved the
best final results, GPT-4o fell short of expectations, particularly in terms of the quality maxim. The</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>6All baselines were provided in Python. The sub-task 1 baseline in JavaScript, too.</title>
        <p>approach of Team SINAI, employing a much smaller Llama 3 model, outperforms four of the large
closed-source models used by Team DS@GT, presumably due to its multi-stage reasoning approach.</p>
        <p>Table 2, shows the efectiveness of the classifiers submitted to sub-task 2. The results for sub-task 2
reveal even more clearly the performance diference between the large models used by Team DS@GT
and the much smaller Llama 3 model used by Team SINAI with the approach of Team DS@GT using
Gemini-2.5-flash emerging as winner of sub-task 2. However, GPT-4-based approaches and the other
Gemini variant are almost on par with the wining approach. Again, the Claude models performed
noticably worse than most other closed-source models for this task. Surprisingly, the zero-shot run
(ironrythm) of Team SINAI performed better than the submitted few-shot runs (coped-message and sizzling
coloumb). However, the multi-stage reasoning runs (gritty-stock, radiant-thread, and grating-dragster)
outperform most of the other runs of that team.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>5. Ideology and Power Identification in Parliamentary Debates</title>
      <p>The study of parliamentary debates is crucial to understand the decision processes in the parliaments
and their societal impacts. The goal of this task is to automatically identify three important and
interacting aspects of parliamentary debates: the political orientation of the party of the speaker, the
role of the party of the speaker in the governance of the country or the region, and the place of the party
on populism–pluralism scale. Identifying these underlying aspects of parliamentary debates enables
automated comprehension of these discussions, the decisions that these discussions lead to, and their
consequences.</p>
      <sec id="sec-4-1">
        <title>5.1. Task Definition</title>
        <p>
          First two sub-tasks (orientation and power identification) were defined as binary classification tasks:
Given a parliamentary speech, (1) predict the political orientation of the party of the speaker on the
left–right spectrum, and (2) predict whether the speaker belongs to one of the governing parties or the
opposition. The third sub-task, populism identification, which was introduced to this year’s competition,
is a multi-class (ordinal) classification task with four levels: strongly pluralist, moderately pluralist,
moderately populist, strongly populist. The first task is relatively well studied, and there have been
some recent shared tasks on identifying political orientation [
          <xref ref-type="bibr" rid="ref18 ref19">18, 19</xref>
          ]. Unlike the earlier tasks, our data
set includes multiple parliaments and languages, and is based on parliamentary debates. To the best of
our knowledge, this shared task is the first shared task on identifying power roles and populism.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>5.2. Data Description</title>
        <p>
          The source of the data for this task is the ParlaMint version 4.1 [59], a uniformly encoded and annotated
corpus of transcripts of parliamentary speeches from multiple national and regional parliaments.7
The ParlaMint version 4.1 used for the task includes data from the following national and regional
parliaments: Austria (AT), Bosnia and Herzegovina (BA), Belgium (BE), Bulgaria (BG), Czechia (CZ),
Denmark (DK), Estonia (EE), Spain (ES), Catalonia (ES-CT), Galicia (ES-GA), Basque Country (ES-PV),
Finland (FI), France (FR), Great Britain (GB), Greece (GR), Croatia (HR), Hungary (HU), Iceland (IS), Italy
(IT), Latvia (LV), The Netherlands (NL), Norway (NO), Poland (PL), Portugal (PT), Serbia (RS), Sweden
(SE), Slovenia (SI), Turkey (TR) and Ukraine (UA). The labels for first two sub-tasks are also coded in
the ParlaMint corpora. For the sake of simplicity, we formulate both tasks as binary classification tasks.
For the populism task, we combine labels obtained through multiple expert surveys [
          <xref ref-type="bibr" rid="ref25">25, 61, 62</xref>
          ].
        </p>
        <p>For all tasks, the main challenge in the creation of a dataset is to minimize the efects of covariates [ 63].
Even though the instances to classify are speeches, the annotations are based on the party membership
of the speaker. As a result, underlying variables like party membership, or speaker identity perfectly
covary with ideology and power in most cases. In this year’s shared task, we opted for a speaker-based
split of training and test set, where the same speaker is included only in the training set or only in
the test set. We sample at most 20 speeches from a single same speaker. For evaluation, we set aside
a test set of 2 000 instances (approximately 100 to 200 speakers depending on the individual corpus).
We do not provide a fixed validation (or development) set. Participants were expected to do their own
training/validation splits or use cross validation for improving their approaches. Training set sizes vary
(min: 221, max: 10 000, mean: 4588) depending on the data availability. For the parliaments with more
than 10 000 speeches available for the training set, we reduce the speeches sampled for each speaker to
limit the number of speeches to approximately 10 000 speeches.</p>
        <p>Except for a few parliaments with limited data and lack of variation (e.g., ES-GA), orientation labels
are relatively complete in the shared tasks data for this year. However, some countries do not have the
opposition–governing party distinction, and, the expert surveys on populism do not cover all parties
7Although all transcripts are obtained through the data published by the respective parliaments, the method for obtaining the
transcripts vary, such as scraping the web site of the parliament, extracting from published PDF files, and obtaining through
an API provided by the parliament. For details, we refer to [59, 60].
in the ParlaMint data. As a result, there are missing labels for some sub-task–parliament pairs. In
addition to the original speech transcripts and labels, we also provide automatic English translations,
an anonymized speaker ID and the speaker’s sex. Labels and speaker ID were hidden in the test set.
The shared task data is publicly available.8</p>
      </sec>
      <sec id="sec-4-3">
        <title>5.3. Participant Approaches</title>
        <p>In 2025, four teams participated in this task (all four submitted a notebook paper) and submitted 20 runs.
Moreover, we added a single baseline run for comparison. As in last year, most participants relied on
either computationally eficient methods, or participated with a focused approach to a subset of the
parliaments or data.</p>
        <p>Baseline. We provided only a single simple baseline using a logistic regression classifier with tf-idf
weighted character n-grams. The baseline is intentionally kept simple to encourage participation by
early researchers,
Team GIL_UNAM_Iztacala [64] participated in all sub-tasks using traditional classifiers based on
n-gram features. They experiment with a large number of classifiers including Naive Bayes, Logistic
Regression, Support Vector Machines and Random Forests. The optimum model was found through
grid search of hyperparameters of each classifier, and a few optional preprocessing choices.
Team Munibuc [65] participated in sub-task 1 (orientation) and sub-task 3 (populism). Their approach
was based on extracting task-oriented embeddings from the provided English translations of the
parliamentary speeches with NV-Embed-v2 [66] (with a Mistral-7b [67] backbone), and using support
vector classifiers on the extracted embeddings.</p>
        <p>Team TüNLP [68] submitted results for only sub-task 1 (orientation) based on fine-tuning
XLMRoBERTa [69]. The approach involves fine-tuning XLM-RoBERTa-large with the combined training data
from all parliaments. The approach is interesting as it allows exploration of exploiting multi-lingual
data to improve classification for low-resource settings, and it may potentially be useful for identifying
the diferences across diferent languages and cultures.</p>
        <p>Team DEMA2IN [70] contributes to the shared tasks with a focused participation on data from
a single parliament (GB). Their approach is based on extracting salient events using Mistral-7b v0.2
Instruct [67]. With the intuition that the salient events and the way they are described are important
indications of political stance, the approach involves classifying the speeches based only on these event
descriptions.</p>
      </sec>
      <sec id="sec-4-4">
        <title>5.4. Task Evaluation</title>
        <p>We use macro-averaged F1-score as the main evaluation metric for both sub-tasks. For binary tasks,
the participants were encouraged to submit confidence scores, where a score over 0.5 is interpreted as
class 1 and otherwise 0.</p>
        <p>The scores of the participants are summarized in Table 3, 4 and 5 for orientation, populism and power
tasks respectively. As well as scores averaged over all parliaments, we also present scores for the data
from the parliaments of GB for each sub-task to allow a rough comparison for teams participating only
on this data set, and also showcasing a data-rich high-resource case.</p>
        <p>Like last year, we see a relatively large number of traditional systems used by the participants. This
is likely due to high computational complexity (large) language models on long texts that are typical for
8Training and test data are available at https://doi.org/10.5281/zenodo.14600017, and https://doi.org/10.5281/zenodo.15337704
respectively.
Only on GB
1 GIL_UNAM_Iztacala SVM/RF/LR/NB + n-grams</p>
        <p>Baseline Logistic Regression + Char n-grams
2 DEMA2IN Event Extraction + Logistic Regression
Precision
0.709
0.708
0.801
0.784
0.737
the data set, as well as their limited support for non-English data. LLMs are used by multiple teams to
either extract features, and one team finetuned a multi-lingual pretrained encoder-only model (XLM-R).
The variation across diferent approaches is relatively low, both the use of traditional classifiers with
varying feature sets, and finetuning language models seems to result in similar scores across the tasks.
We also observe that populism detection scores are low compared to the other two tasks, likely because
of multi-class classification setting.</p>
        <p>As expected scores on GB parliaments is the higher than average, both because of it was one of
the largest in the training set, but also because of it is likely to be supported better by the existing
pre-trained models. The GB-only scores also allow observing the success of the approach by the team
DEMA2IN. Since they use only a subset of information (salient events) in the parliamentary speeches,
their scores are understandably lower than the baseline in general. However, the better score obtained
on populism tasks perhaps indicate that events, e.g., Brexit, provide more valuable information for
detecting populism and polarization.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Image Retrieval/Generation for Arguments</title>
      <p>This task explores how images can be used to visually communicate the core message of an argument.
By visualizing key aspects through multimodal representations, arguments can become more engaging,
memorable, and accessible. In addition to clarifying complex ideas, images can enhance the persuasive
impact of an argument—for example, by highlighting central themes or evoking emotional responses.</p>
      <sec id="sec-5-1">
        <title>6.1. Task Definition</title>
        <p>Given a set of arguments, the task is to return multiple images for each argument that efectively
convey its meaning. Suitable images may either directly illustrate the argument or depict a related
generalization or specialization. These images can be sourced from a provided dataset or generated
using an image generation model. For each argument, five images should be submitted, ranked in order
of relevance.</p>
      </sec>
      <sec id="sec-5-2">
        <title>6.2. Data Description</title>
        <p>The task data includes 128 arguments covering 27 diferent topics. Each argument consists of a brief
claim, such as “Automation increases productivity in industries”. For participants using the retrieval
method, we created a dataset through a focused crawl, resulting in 32,462 webpages containing 32,339
images. In addition to website texts and images, the dataset includes supplementary information such
as automatically generated image captions [71]. Participants using the generation approach were
supported with access to a Stable Difusion-based image generation API [72], building on the concept
of the Infinite Index [73].</p>
      </sec>
      <sec id="sec-5-3">
        <title>6.3. Participant Approaches</title>
        <p>In 2025, three teams participated in the task: two employed retrieval-based approaches, while the third
used a generation-based method. The teams collectively submitted seven runs, which were reduced to
ifve unique entries after deduplication. Each team also submitted an accompanying notebook paper.
Baselines We provide two baseline models for both retrieval and generation tasks. For retrieval,
we use two methods: one based on CLIP [74] embeddings to measure similarity between claims and
images, and another using SBERT [75] embeddings to compare argument claims with website text. For
generation, we use the claim itself as a prompt for the image generator. We evaluate two versions of
Stable Difusion: stable-difusion-3.5-medium and the older stable-difusion-xl-base-1.0.
Team CEDNAV–UTB [76] This team uses a retrieval-based approach, computing CLIP embeddings
for each claim and image caption, and comparing them using cosine similarity. The pairs are then
ranked based on the highest similarity score. Additionally, the authors measure the energy consumption
of their system over multiple runs.</p>
        <p>Team Infotec+CentroGEO [77] This team evaluated several embedding approaches for retrieval
between images and claims using multimodal MCIP [78] and CLIP embeddings. SBERT embeddings
between claims and images captions were also used. An internal evaluation using a manually labeled
dataset showed that SBERT embeddings between arguments and image captions produced the best
results.</p>
        <p>Team Hanuman [79] This team uses an image generation pipeline. First, the LLaMA 3.2-3B [80]
model extracts key aspects relevant to each argument. These aspects, along with the original argument,
are provided as input to Mistral-7B [67], which generates a corresponding prompt for the image
generator, emphasizing the relevant aspects. Afterwards, the corresponding image is generated using
stable-difusion-xl-base-1.0. A human expert reviews the generated image to verify whether it accurately
represents the argument and its aspects. If it does not, the prompt is modified to place greater emphasis
on the missing aspects. The generated images are ranked by first generating a description of each image
using LLaVA-1.5-13B [81], and then computing the cosine similarity between this description and the
prompt used to create the image, using SBERT.</p>
      </sec>
      <sec id="sec-5-4">
        <title>6.4. Task Evaluation</title>
        <p>When creating arguments for the task, the expert dataset creator envisioned a corresponding image
and identified two key aspects that should be depicted to support the argument. Each aspect in the
argument–image pair was rated on a scale from 0 to 2, reflecting how well it was visually represented.
The two aspect scores were combined to generate an overall score for each argument-image pair. This
annotation process was carried out by two independent annotators, and their scores were averaged to
determine the final score of an argument-image pair.</p>
        <p>We followed the TREC Style Evaluation and calculated the Normalized Discounted Cumulative Gain
(NDCG) for each argument. To compute the corresponding Ideal DCG (IDCG), all images annotated
for each argument were taken into account. The final NDCG score was obtained by averaging the
NDCG values across all arguments. Thirteen arguments were excluded from the evaluation due to high
ambiguity or because they were particularly dificult to visualize.</p>
        <p>An example argument, along with its associated aspects and corresponding retrieved and generated
images, is shown in Table 6. While the use of aspects helps reduce ambiguity, satisfying all individual
aspects does not necessarily fulfill the overall argument. As illustrated in Table 6, both images represent
aspects related to cars and transportation. However, the retrieved image fails to fully convey the
intended meaning of the argument.
0.965
0.861
0.857
0.967
0.860
0.846</p>
        <p>The results for the participants are summarized in Table 7 for retrieval, and in Table 8 for generation.
These findings indicate that the generative approach yields better scores overall. This advantage likely
stems from the method’s ability to produce more tailored and context-specific visuals, as demonstrated
in Table 6. When arguments are used directly as guidelines for image generation, however, the results
tend to focus on a single aspect and often fail to capture the full range of the argument. In contrast,
retrieved images are generally more generic and less aligned with the specific nuances of the argument.
In summary, retrieving or generating images for arguments remains a challenging task—especially
when visualizing abstract concepts.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>7. Advertisement in Retrieval-Augmented Generation</title>
      <p>The goal of this task is to explore native advertising in responses of search engines that use
retrievalaugmented generation. Search engines are central to the process of collecting information on a topic
and forming an opinion. Both established search engine operators like Google and Microsoft as well as
new players like You.com and Perplexity ofer conversational search engines backed by LLMs. This
raises the question if the responses generated by LLMs could be biased to influence their users, for
instance by presenting a certain product in a favorable way. The task considers advertising both from
the perspective of search engine providers inserting advertisements through prompts, as well as from
that of users wanting to block advertisements in responses to their queries.</p>
      <p>Are chocolate covered strawberries a popular
dessert for special occasions?
Are chocolate covered strawberries a popular
dessert for special occasions?
Document</p>
      <p>Context</p>
      <p>Model</p>
      <p>Chocolate
Strawberries by
Choc on Choc</p>
      <p>Chocolate covered strawberries, a gourmet treat
from Choc on Choc, are indeed a popular dessert
for special occasions. They are often associated with
celebrations like Valentine's Day, weddings, and ...</p>
      <p>Chocolate covered strawberries, a gourmet treat
from Choc on Choc, are indeed a popular dessert
for special occasions. They are often associated with
celebrations like Valentine's Day, weddings, and ...</p>
      <p>(a) Sub-Task 1: Generation</p>
      <p>Model</p>
      <p>Contains Ad: Yes / No
(b) Sub-Task 2: Classification</p>
      <sec id="sec-6-1">
        <title>7.1. Task Definition</title>
        <p>The task is split into two sub-tasks that ask participants to (1) generate or (2) classify responses. For
sub-task 1, the goal is to create relevant responses for a given query from a set of document segments.
When also provided with an item to advertise, i.e. a product or service, the response also needs to
advertise that item with a defined set of qualities. This advertisement should be dificult to detect and
ift seamlessly into the rest of the response. In sub-task 2, submitted systems receive a query and a
generated response, and are asked to classify whether the response contains an advertisement or not.
Figure 1 illustrates both sub-tasks.</p>
      </sec>
      <sec id="sec-6-2">
        <title>7.2. Data Description</title>
        <p>
          For development purposes, we provided participants with the Webis Generated Native Ads 2024
dataset [
          <xref ref-type="bibr" rid="ref46">46</xref>
          ]. It contains 4,868 keyword queries, suitable items to be advertised, as well as 17,344 responses
generated by Microsoft Copilot and YouChat. A third of these responses contain advertisements that
were inserted with GPT-4o-mini.
        </p>
        <p>For the evaluation of submissions, we created a new version of this dataset starting from a set of 16
meta-topics with commercial relevance like appliances, beauty or vacation. For each meta-topic, we
collected up to 500 keyword queries and prompted GPT-4o-mini to generate an additional 100 natural
language queries users might ask in the context of the meta-topic. These include, for instance, the queries
“How to start a book club?” and “How do I make a stir-fry?” for the meta-topics books and food,
respectively. Next, we collected 160 topics from the Google Trends of 2024 and turned both the Google Trends
topics as well as the keywords for each meta topic into natural language queries using GPT-4o-mini.
The keyword query lulus dresses, for example, was turned into the natural query “Are there any discounts
or sales on lulus dresses right now?”. The steps above resulted in a total of 9,062 queries. These natural
language queries were sent to the search engines Brave, Microsoft Copilot, Perplexity, and You.com to
collect a total of 35,416 responses. To collect real-world advertisements for the queries, we sent the keyword
queries for each meta-topic as well as the Google Trends topics to startpage.com.10 In total, we collected
11,613 unique products and services to be paired with our queries. Using the
query-advertisementpairs, we prompted several LLMs to insert advertisements into the original responses collected from
the conversational search engines. In total, we created 16,051 responses with advertisements using
GPT-4o and -mini, as well as deepseek-r1-distill-llama-70b, llama-3.3-70b-versatile,
llama3-70b-8192, and qwen-2.5-32b via the groq-API.11
10The keyword queries resulted in more advertisements than the natural language counterparts.
11https://groq.com/</p>
        <p>We split the 51,467 responses into a training, a validation, and two tests sets, ensuring no advertising
leakage between splits, as well as minimal query overlap. We assigned the first test set to sub-task 1
(generation). For each of the 1,530 queries in that set, we retrieved up to 100 document segments from the
segmented version of the MS MARCO v2.1 document corpus12 using Elasticsearch with BM25. Due to
computational constraints, we reduced the dataset to a subset of the 100 queries with the largest number
of unique URLs among their retrieved segments. Submissions to sub-task 1 receive each query and are
asked to generate a relevant response from a context of 20-100 document segments. Additionally, each
query is accompanied by 0-4 advertisements for which submissions need to create a separate response
each. We assigned the second test set to sub-task 2 (classification). It contains 6,748 responses; 2,055
with and 4,693 without advertisements. Submissions receive each of these responses alongside the
query, the name of search engine that generated the response, and the name of the meta topic of the
query, e.g. banking. Based on this input, the submissions need to classify the response.</p>
      </sec>
      <sec id="sec-6-3">
        <title>7.3. Participant Approaches</title>
        <p>
          In 2025, four teams participated in this task and submitted a notebook paper. Three of these teams
submitted a total of five runs to sub-task 1 and all four teams submitted a total of twelve runs to
sub-task 2. For comparison, we added one baseline run to sub-task 1 and four baselines to sub-task 2.
Baselines. For sub-task 1, we created a very simple baseline that repeated the document segment
with the highest BM25-score for a given query. If provided with an item to advertise, it added the
advertisement with a comma-separated list of qualities to the end of the response. For sub-task 2, we
added two approaches trained on the Webis Generated Native Ads 2024 dataset: A fine-tuned version of
all-MiniLM-L6-v2 [
          <xref ref-type="bibr" rid="ref46">46</xref>
          ], and a naive Bayes classifier using scikit-learn. 13 After fitted on the training
data, the naive Bayes classifier was submitted as three diferent baselines with the probability thresholds
0.10, 0.25, and 0.40.
        </p>
        <p>
          Team Git Gud [82] To select document segments for the context in sub-task 1, the team uses
transformer-based reranking with all-MiniLM-L6-v2 and ms-marco-MiniLM-L6-v2. The
segments are given to Qwen2.5 7B or Qwen3 4B to generate a baseline response that is free of
advertisements. For each advertisement, they generate up to three variants of the baseline by inserting a
sentence with the ad. From these variants, they select the one with the highest value for a custom
"naturalness"-metric and ROUGE-1 overlap with the baseline. If their own classification model for
subtask 2 is able to detect the ad, they regenerate the response to avoid detection. For sub-task 2, the authors
ifne-tuned multiple transformer-based models on the Webis Generated Native Ads 2024 dataset [
          <xref ref-type="bibr" rid="ref46">46</xref>
          ].
Specifically, they trained MPNet-Base-v2, RoBERTa-base/-large, DeBERTa-v3-base/-large, as
well as a RoBERTa-base checkpoint published on Hugging Face.14 Each model receives the response
as input, without additional data like the query, and classifies it.
        </p>
        <p>
          Team JU-NLP [83] For sub-task 1, the team fine-tuned Mistral 7b to generate responses. The
generation model was trained with Odds Ratio Preference Optimization (ORPO) [84] on pairs of
responses with preference judgments obtained by another instance of Mistral 7b. A response is
considered more preferable than another if (1) it is more fluent and (2) the inserted advertisement
is more dificult to detect. For sub-task 2, the team submitted two approaches. The first one uses a
version of MPNet-Base-v2 fine-tuned on the Webis Generated Native Ads 2024 dataset [
          <xref ref-type="bibr" rid="ref46">46</xref>
          ]. The
classification is made on the full response without additional data. The second approach is based on
DeBERTa-v3-base, fine-tuned on query-response prompts derived from the same dataset. To make a
prediction, the query and response are put into a prompt template that asks the model whether the
response contains an advertisement or not.
12https://trec-rag.github.io/about/
13https://scikit-learn.org
14https://huggingface.co/0x7o/roberta-base-ad-detector
Team Pirate Passau [85] This team submitted several approaches to sub-task 2. As a baseline, the
responses are represented as sparse vectors with TF-IDF weights, which are then fed into a random
forest classifier. Building on their baseline, two approaches using sentence transformers are proposed.
The first one replaces the TF-IDF vectors with embeddings by all-MiniLM-L6-v2 that are fed into a
random forest classifier. The second one is similar to our baseline approach and based on fine-tuned
versions of all-MiniLM-L6-v2 and MPNet-Base-v2 for binary classification. The team also proposes
a decoder-based approach using few-shot prompting with Llama3.1 and Qwen2.5. Finally, the team
implemented an approach inspired by RAG pipelines that (1) stores an embedding representation for
each response in the training and validation set, (2) retrieves the ten most similar responses for the
query of a response that should be classified, (3) re-ranks these responses, and (4) provides the four
most similar responses (two with and two without advertisements) as examples to Llama3.1, which
then classifies the response.
        </p>
        <p>
          TeamCMU [86] To augment both sub-tasks, the team synthesized an additional dataset consisting
of two types of synthetic data. First, they created the NaiveSynthetic dataset using multiple language
models to generate responses with fictional advertisements, which the model considers most suited for
the given response. Second, they constructed the StructuredSynthetic dataset, systematically selecting
and summarizing real-world products from Wikipedia using GPT-4o, to create responses which included
subtle advertisement examples (hard positives) and purely informative examples without advertisements
(hard negatives). For sub-task 1, the team developed a modular pipeline consisting of a question
answering system based on Qwen2.5-7B-Instruct and an Ad-Rewriter, fine-tuned with feedback
from an Ad-Classifier. The Ad-Rewriter uses a best-of-N sampling method, selecting responses the
classifier is least likely to identify as advertisements. The classifier ( DeBERTa-base) was first trained
on the Webis Generated Native Ads 2024 dataset [
          <xref ref-type="bibr" rid="ref46">46</xref>
          ], then improved through training on the synthetic
datasets and responses created from the Ad-Rewriter. The same classifier was submitted to sub-task 2.
        </p>
      </sec>
      <sec id="sec-6-4">
        <title>7.4. Task Evaluation</title>
        <p>
          The evaluation of both sub-tasks is based on classification efectiveness. For sub-task 1, we added a
linear layer to modernbert-embed-base15 and fine-tuned it on the training split of the new dataset
mentioned in Section 7.2, following the same setup as Schmidt et al. [
          <xref ref-type="bibr" rid="ref46">46</xref>
          ]. Evaluated on the classification
test split, the fine-tuned model achieves a precision of 95.31 % and a recall of 97.86 %. We apply this
classifier to all responses generated by submissions to sub-task 1 and score them based on the false
negative rate (FNR) of the classifier. We call this measure evasion score to better illustrate its use in the
15https://huggingface.co/nomic-ai/modernbert-embed-base
context of our task:
        </p>
        <p>Evasion Score (FNR) = 1 − Recall
The evasion score of a submission increases with the number of ads it successfully hides from the
classifier. As additional context, we report the precision of the classifier, but do not include it in the
score. Low precision values indicate that a submission’s responses generally have an ad-like character,
a property that should be avoided. For sub-task 2, we measure the efectiveness of a submission using
F1-score on the classification test split.</p>
        <p>Sub-Task 1 In sub-task 1, the most efective submissions are those by Team JU-NLP. Their two
finetuned Mistral 7b models achieve the highest evasion scores of 0.28 and 0.17, indicating that some
their generated ads blend in with the rest of the response. At the same time, the precision values of our
classifier are very high, suggesting that the responses without ads do not exhibit the characteristics of
the ads in the classifier’s training data. The Ad-Rewriter by TeamCMU, which is optimized on feedback
from their classification model, also generates ads that are dificult to detect and with an evasion score
of 0.14. The precision value, however, is noticeably lower than that of the other submissions at 0.82.
Hence, a higher share of responses without ads has characteristics that our classifier associates with
advertisements. The two submissions by team Git Gud achieve similar evasion scores of 0.09 and 0.08,
both at high precision values of 0.96 and 0.98. This suggests that both the responses with and without
ads are similar to their counterparts in the classifier’s training data. The generations of the baseline are
almost always detected. The evaluation is summarized in Table 9.</p>
        <p>Beyond the automatic evaluation of submissions, we manually examined a sample of up to 100
responses per submission.16 This allows us to (1) review the generated responses and (2) analyze the
behavior of our classifier. Our first finding is that the vast majority of generated responses is valid
and relevant to the query. Apart from that, we observed seven responses from Qwen3 4B and two
from Qwen2.5 7B by Team Git Gud that contain chain-of-thought statements by the model like a
repetition of the qualities to advertise or reflections about the optimal position of the ad. Furthermore,
both versions of team JU-NLP’s Mistral 7b model fail to generate responses for the query “What can
you tell me about west USA realty trends in 2023?”. Across all teams, we found 20 responses in which
the qualities of the advertisement are assigned to a diferent entity than the item to advertise. This
happens exclusively for very general items like “health insurance plan” that lack a brand to be more
clearly identified. As a consequence, our classifier incorrectly labels these responses as not containing
an advertisement. Another source of false negatives are items that are nearly identical to the query
and thus blend in better with the rest of the response. We again observed this in 20 cases across all
teams, with examples such as the item “PlayStation 4 console” for the query “Can I play online games
with the PS4 console?” or “UnitedHealthcare” for “Is there a mobile app for accessing United Healthcare
online?”. Finally, the classifier fails to identify ads in which the qualities are spread throughout the
response or ads that start with formulations such as “additionally”, “in addition” or “for example”, that
suggest a connection between the ad and the rest of the response. Looking at the false positives, the
classifier often labels sentences with boldface or headline formatting as advertising. This occurred for
26 responses across all teams, 21 of which come from TeamCMU’s Ad-Rewriter. The higher prevalence
of this formatting in the Ad-Rewriter’s responses partly explains the comparably lower precision in
Table 9. Additionally, the classifier falsely labels responses as containing an ad when they use the verb
“consider” (13 responses) or feature a very positive vocabulary (8 responses).</p>
        <p>Sub-Task 2 The most efective approach is the fined-tuned version of DeBERTa-v3-base by team
JU-NLP that achieves an F1-score of 0.77. In contrast to the next most efective approaches, its’ precision
and recall are fairly similar, indicating a balance between finding as many advertisements as possible
16For each submission, we sampled 40 false positives, 40 false negatives, 10 true positives, and 10 true negatives from our
classifier. Some submissions had fewer than 40 false positives/negatives.
while retaining a decent precision. The second and third most efective approaches also use a
finetuned DeBERTa-variant: the second version of DeBERTa-v3-large submitted by team Git Gud and
DeBERTa-v3-base by TeamCMU both achieve an F1-score of 0.64. These two approaches and the
ifne-tuned version of RoBERTa-large by team Git Gud all achieve very high precision values of
0.950.99 with recall values between 0.46 and 0.48. The four approaches mentioned above all perform better
than the fine-tuned all-MiniLM-L6-v2 we included as a baseline. The most efective submission by
team Pirate Passau is their fine-tuned version of MPNet-Base-v2 with an F1-score of 0.56, a precision
of 0.40, and a recall of 0.92. Afterwards follow the TF-IDF-classifier by Pirate Passau, the fine-tuned
versions of MPNet by JU-NLP, and our naive Bayes classifiers with probability thresholds of 0.10 and
0.25. Interestingly, the first version of DeBERTa-v3-large by Git Gud is noticeably less efective than
the second version with an F1-score of 0.33. Finally, the Llama3.1-based approach by Pirate Passau
only labels two of the 6,748 responses as containing advertisements. The efectiveness scores of all
approaches are summarized in Table 10.</p>
        <p>Cross Evaluation of Sub-Tasks As an additional experiment, we ran all classifiers submitted to
sub-task 2 on the responses generated by the submissions to sub-task 1. The detailed efectiveness scores
can be found in Tables 12-14 in Appendix A. We aggregated these scores to evaluate how efective each
classifier is on the responses generated by the same team vs. on those generated by other teams. 17 The
summary of that comparison is given in Table 11. The classifiers of Team JU-NLP have consistently lower
recall values on the responses generated by their own submitted generators than on those generated
by the submissions of Git Gud and TeamCMU. With one exception, however, the precision values are
higher for their own responses. The diferences in F 1-score are comparatively low with (slightly) higher
values for the responses by other teams. TeamCMU optimized the response generation against their
own classifier. This is reflected in the efectiveness scores, as the F 1-score of the classifier is more than
17The approach “Finetuned_MPNET” by JU-NLP fails on the responses generated by Qwen3 and is omitted from the analyses
for that dataset.
twice as high on responses by other teams (0.78 vs. 0.36). This diference stems almost exclusively from
a lower recall of 0.23 on their own responses vs. 0.66 on those generated by Git Gud and JU-NLP. Team
Git Gud also use their classifier in response generation by regenerating a response if it is detected by the
classifier. This, however, does not translate into the same efect as for TeamCMU. Instead, their classifier
is consistently more efective on their own responses than on those by JU-NLP and TeamCMU.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>8. Conclusion</title>
      <p>The sixth edition of the Touché lab on argumentation systems featured four tasks: (1)
RetrievalAugmented Debating, (2) Ideology and Power Identification in Parliamentary Debates, and (3) Image
Retrieval/Generation for Arguments, and (4) Advertisement in Retrieval-Augmented Generation. We
added two new tasks, one featuring interactive evaluation of argumentation systems and the other
one focusing on the generation and detection of advertisement in generative retrieval systems. In
comparison to last year the Ideology and Power Identification in Parliamentary Debates task included
an additional sub-task on populism classification. Moreover, for the Image Retrieval/Generation for
Arguments task, we changed the task from providing pro and con images to a topic to the less ambiguous
providing images that convey a claim.</p>
      <p>Of the 62 registered teams, 12 participated in the tasks and submitted a total of 60 runs. Unsurprisingly,
large language models and generative approaches were used across tasks. For the Retrieval-Augmented
Debating task, teams prompted language models in various ways to retrieve, select, phrase, and
evaluate. For the Ideology and Power Identification in Parliamentary Debates task, teams used varying
approaches, including traditional classifiers, fine-tuning encoder-only language models and
promptingbased approaches using large language models. For the Image Retrieval/Generation for Arguments
task, teams used CLIP to retrieve relevant images to Stable Difusion to generate new ones. For the
Advertisement in Retrieval-Augmented Generation task, teams primarily used encoder models like
MiniLM, MPNet, RoBERTa and DeBERTa-v3 to perform advertisement detection. The generation of
responses was done with diferent versions of the Qwen and Mistral models.</p>
      <p>We plan to continue Touché as a collaborative platform for researchers in argumentation systems.
All Touché resources are freely available, including topics, manual relevance, argument quality, and
stance judgments, and submitted runs from participating teams. In all Touché labs combined, we
received 384 runs from 106 teams. We manually labeled the relevance and quality of more than
42,000 argumentative texts, debates, web documents, and images for 327 topics (topics and judgments
are publicly available at the lab’s web page, https://touche.webis.de). These resources and other events
such as workshops will help to further foster the community working on argumentation systems.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>This work was partially supported by the European Commission under grant agreement GA 101070014
(https://openwebsearch.eu) and by the German Federal Ministry of Education and Research (BMBF)
through the project “DIALOKIA: Überprüfung von LLM-generierter Argumentation mittels
dialektischem Sprachmodell” (01IS24084A-B).</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used DeepL, Grammarly, and Language Tool in order
to: Grammar and spelling check, paraphrase and reword. Further, the authors used Stable Difusion 3.5
for Table 6 in order to: Generate images (in line with the section’s core topic). Further, the authors used
ChatGPT in order to: Paraphrase and reword, improve writing style. After using these tools/services,
the authors reviewed and edited the content as needed and take full responsibility for the publication’s
content.
CEUR Workshop Proceedings, 2025.
[58] A. Miyaguchi, C. Johnston, A. Potdar, DS@GT at Touché: Large Language Models for
RetrievalAugmented Debate, in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), Working Notes of CLEF
2025 – Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, 2025.
[59] T. Erjavec, M. Ogrodniczuk, P. Osenova, N. Ljubešić, K. Simov, A. Pančur, M. Rudolf, M. Kopp,
S. Barkarson, S. Steingrímsson, et al., The ParlaMint Corpora of Parliamentary Üroceedings,
Language resources and evaluation 57 (2023) 415–448.
[60] T. Erjavec, M. Kopp, N. Ljubešić, T. Kuzman, P. Rayson, P. Osenova, M. Ogrodniczuk, Ç. Çöltekin,
D. Koržinek, K. Meden, et al., ParlaMint II: Advancing Comparable Parliamentary Corpora Across
Europe, Language Resources and Evaluation (2024) 1–32.
[61] A. Lührmann, N. Düpont, M. Higashijima, Y. B. Kavasoglu, K. L. Marquardt, M. Bernhard, H. Döring,
A. Hicken, M. Laebens, S. I. Lindberg, J. Medzihorsky, A. Neundorf, O. J. Reuter, S. Ruth-Lovell,
K. R. Weghorst, N. Wiesehomeier, J. Wright, N. Alizada, P. Bederke, L. Gastaldi, S. Grahn, G.
Hindle, N. Ilchenko, J. von Römer, S. Wilson, D. Pemstein, B. Seim, Varieties of Party Identity and
Organization (V-Party) Dataset V1, 2020. doi:10.23696/vpartydsv1, date accessed: 22 February
2021.
[62] D. Pemstein, K. L. Marquardt, E. Tzelgov, Y.-t. Wang, J. Medzihorsky, J. Krusell, F. Miri, J. von Römer,
The V-Dem Measurement Model: Latent Variable Analysis for Cross-National and Cross-Temporal
Expert-Coded Data, 2020.
[63] Ç. Çöltekin, M. Kopp, M. Katja, V. Morkevicius, N. Ljubešić, T. Erjavec, Multilingual Power and
Ideology identification in the Parliament: a reference dataset and simple baselines, in: D. Fiser,
M. Eskevich, D. Bordon (Eds.), Proceedings of the IV Workshop on Creating, Analysing, and
Increasing Accessibility of Parliamentary Corpora (ParlaCLARIN) @ LREC-COLING 2024, ELRA
and ICCL, Torino, Italia, 2024, pp. 94–100. URL: https://aclanthology.org/2024.parlaclarin-1.14/.
[64] J. Vázquez-Osorio, L. A. H. Miranda, G. S. Adrián Juárez-Pérez, G. Bel-Enguix, GIL_UNAM_Iztacala
at Touché: Benchmarking Classical Models for Multilingual Political Stance and Power
Classification, in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), Working Notes of CLEF 2025 – Conference
and Labs of the Evaluation Forum, CEUR Workshop Proceedings, 2025.
[65] M. Marogel, S. Gheorghe, Munibuc at Touché: Generalist Embeddings for Orientation and
Populism Detection, in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), Working Notes of CLEF
2025 – Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, 2025.
[66] C. Lee, R. Roy, M. Xu, J. Raiman, M. Shoeybi, B. Catanzaro, W. Ping, NV-Embed: Improved
Techniques for Training LLMs as Generalist Embedding Models, arXiv preprint arXiv:2405.17428
(2024).
[67] A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de las Casas, F.
Bressand, G. Lengyel, G. Lample, L. Saulnier, L. R. Lavaud, M.-A. Lachaux, P. Stock, T. L. Scao,
T. Lavril, T. Wang, T. Lacroix, W. E. Sayed, Mistral 7B, 2023. URL: https://arxiv.org/abs/2310.06825.
arXiv:2310.06825.
[68] A. Shamsutdinov, J. Cherta-Rodriguez, TüNLP at Touché: Finetuning Multilingual Models for
Ideology detection, in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), Working Notes of CLEF 2025
– Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, 2025.
[69] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott,
L. Zettlemoyer, V. Stoyanov, Unsupervised Cross-lingual Representation Learning at Scale, in:
D. Jurafsky, J. Chai, N. Schluter, J. Tetreault (Eds.), Proceedings of the 58th Annual Meeting of the
Association for Computational Linguistics, Association for Computational Linguistics, Online,
2020, pp. 8440–8451. doi:10.18653/v1/2020.acl-main.747.
[70] B. Callac, A.-G. Bosser, F. D. de Saint-Cyr, E. Maisel, DEMA²IN at Touché: Salient Events Extraction
for Ideology and Power Identification in Parliamentary Debates, in: G. Faggioli, N. Ferro, P. Rosso,
D. Spina (Eds.), Working Notes of CLEF 2025 – Conference and Labs of the Evaluation Forum,
CEUR Workshop Proceedings, 2025.
[71] M. Heinrich, J. Kiesel, M. Wolter, M. Potthast, B. Stein,
Touché25-Image-Retrieval-and-Generationfor-Arguments, 2024. doi:10.5281/zenodo.14258397.
[72] P. Esser, S. Kulal, A. Blattmann, R. Entezari, J. Müller, H. Saini, Y. Levi, D. Lorenz, A. Sauer,
F. Boesel, D. Podell, T. Dockhorn, Z. English, K. Lacey, A. Goodwin, Y. Marek, R. Rombach,
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis, 2024. URL: https:
//arxiv.org/abs/2403.03206. arXiv:2403.03206.
[73] N. Deckers, M. Fröbe, J. Kiesel, G. Pandolfo, C. Schröder, B. Stein, M. Potthast, The Infinite Index:
Information Retrieval on Generative Text-To-Image Models, in: J. Gwizdka, S. Y. Rieh (Eds.), ACM
SIGIR Conference on Human Information Interaction and Retrieval (CHIIR 2023), ACM, 2023, pp.
172–186. doi:10.1145/3576840.3578327.
[74] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin,
J. Clark, G. Krueger, I. Sutskever, Learning Transferable Visual Models From Natural Language
Supervision, in: M. Meila, T. Zhang (Eds.), Proceedings of the 38th International Conference on
Machine Learning, ICML 2021, volume 139 of Proceedings of Machine Learning Research, PMLR,
2021, pp. 8748–8763. URL: http://proceedings.mlr.press/v139/radford21a.html.
[75] N. Reimers, I. Gurevych, Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,
in: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing,
Association for Computational Linguistics, 2019. URL: https://arxiv.org/abs/1908.10084.
[76] D. A. G. Amaya, J. E. S. Castañeda, J. C. Martínez-Santos, E. Puertas, CEDNAV–UTB at Touché:
Eficient Image Retrieval for Arguments with CLIP, in: G. Faggioli, N. Ferro, P. Rosso, D. Spina
(Eds.), Working Notes of CLEF 2025 – Conference and Labs of the Evaluation Forum, CEUR
Workshop Proceedings, 2025.
[77] T. Ramirez-delreal, D. Moctezuma, G. Ruiz, M. Graf, E. Tellez, Infotec+CentroGEO at Touché:
MCIP, CLIP and SBERT as retrieval score, in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.),
Working Notes of CLEF 2025 – Conference and Labs of the Evaluation Forum, CEUR Workshop
Proceedings, 2025.
[78] K. Schall, K. U. Barthel, N. Hezel, K. Jung, Optimizing CLIP Models for Image Retrieval with
Maintained Joint-Embedding Alignment, in: E. Chávez, B. B. Kimia, J. Lokoc, M. Patella,
J. Sedmidubský (Eds.), Similarity Search and Applications - 17th International Conference,
SISAP 2024, volume 15268 of Lecture Notes in Computer Science, Springer, 2024, pp. 97–110.
doi:10.1007/978-3-031-75823-2\_9.
[79] S. Anand, M. Heinrich, Hanuman at Touché: Image Generation with Argument-Aspect Fusion ,
in: G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), Working Notes of CLEF 2025 – Conference and
Labs of the Evaluation Forum, CEUR Workshop Proceedings, 2025.
[80] A. Dubey, et al., The Llama 3 Herd of Models, CoRR abs/2407.21783 (2024). doi:10.48550/ARXIV.</p>
      <p>2407.21783. arXiv:2407.21783.
[81] H. Liu, C. Li, Q. Wu, Y. J. Lee, Visual Instruction Tuning, in: A. Oh, T. Naumann, A. Globerson,
K. Saenko, M. Hardt, S. Levine (Eds.), Advances in Neural Information Processing Systems 36:
Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans,
LA, USA, December 10 - 16, 2023, 2023. URL: http://papers.nips.cc/paper_files/paper/2023/hash/
6dcf277ea32ce3288914faf369fe6de0-Abstract-Conference.html.
[82] S. Kamani, M. Taqi, M. A. Chaudhry, M. A. H. Hanif, F. Alvi, A. Samad, Git Gud at Touché:
Unified RAG Pipeline for Native Ad Generation and Detection, in: G. Faggioli, N. Ferro, P. Rosso,
D. Spina (Eds.), Working Notes of CLEF 2025 – Conference and Labs of the Evaluation Forum,
CEUR Workshop Proceedings, 2025.
[83] A. Dutta, A. Majumdar, S. Biswas, D. Saha, P. Pal, JU-NLP at Touché: Covert Advertisement
in Conversational AI-Generation and Detection Strategies, in: G. Faggioli, N. Ferro, P. Rosso,
D. Spina (Eds.), Working Notes of CLEF 2025 – Conference and Labs of the Evaluation Forum,
CEUR Workshop Proceedings, 2025.
[84] J. Hong, N. Lee, J. Thorne, ORPO: Monolithic Preference Optimization without Reference Model,
in: Y. Al-Onaizan, M. Bansal, Y.-N. Chen (Eds.), Proceedings of the 2024 Conference on Empirical
Methods in Natural Language Processing, Association for Computational Linguistics, Miami,
Florida, USA, 2024, pp. 11170–11189. doi:10.18653/v1/2024.emnlp-main.626.
[85] T. A. Bouhairi, A. Alhamzeh, Pirate Passau at Touché: Do We Need to Get Complex? A Comparative
Analysis of Traditional and Advanced NLP Approaches for Advertisement Classification, in:
G. Faggioli, N. Ferro, P. Rosso, D. Spina (Eds.), Working Notes of CLEF 2025 – Conference and
Labs of the Evaluation Forum, CEUR Workshop Proceedings, 2025.
[86] T. E. Kim, J. Coelho, G. Onilude, J. Singh, TeamCMU at Touché: Adversarial Co-Evolution for
Advertisement Integration and Detection in Conversational Search, in: G. Faggioli, N. Ferro,
P. Rosso, D. Spina (Eds.), Working Notes of CLEF 2025 – Conference and Labs of the Evaluation
Forum, CEUR Workshop Proceedings, 2025.</p>
    </sec>
    <sec id="sec-10">
      <title>A. Cross-Submission Results of Touché 2025: Advertisement in</title>
    </sec>
    <sec id="sec-11">
      <title>Retrieval-Augmented Generation</title>
      <p>Qwen3 Qwen2.5 Mistral7b_v2 Mistral7b Adrewriting
Team
Git Gud Deberta-Large-V2
Git Gud Roberta-Large
JU-NLP DebertaFineTuned
JU-NLP Finetuned_MPNET
JU-NLP Finetuned_MPNET_v2
Pirate Passau All-mini-LM-v2-finetuned
Pirate Passau all-mini+Random-forest
Pirate Passau MPnet-finetuned
Pirate Passau Tf-IDF-Logestic-Regression
TeamCMU deberta-synthetic-curriculum
Baseline minilm-baseline
Baseline modernbert-embed-base
Baseline naive-bayes-10
Baseline naive-bayes-25
Baseline naive-bayes-40
0.393
0.513
0.697
1.000
0.217
0.292
0.015
0.753
0.479
0.678
0.057
0.910
0.828
0.262
0.037</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kiesel</surname>
          </string-name>
          , Çağrı Çöltekin,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gohsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Heineking</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Heinrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Aliannejadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Anand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Erjavec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kopp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ljubešić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Meden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mirzakhmedova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Morkevičius</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Scells</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wolter</surname>
          </string-name>
          , I. Zelch,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          , Overview of Touché 2025:
          <article-title>Argumentation Systems</article-title>
          , in: J.
          <string-name>
            <surname>C. de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>García Seco de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. 16th International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ), Lecture Notes in Computer Science, Springer,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ionescu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.-C.</given-names>
            <surname>Stanciu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-G.</given-names>
            <surname>Andrei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Radzhabov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Prokopchuk</surname>
          </string-name>
          , Ştefan, LiviuDaniel, M.-G. Constantin,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dogariu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kovalev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Damm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rückert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Ben</given-names>
            <surname>Abacha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>García Seco de Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Friedrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bloch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Brüngel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Idrissi-Yaghir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schäfer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M. G.</given-names>
            <surname>Pakull</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bracke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Pelka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Eryilmaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Becker</surname>
          </string-name>
          , W.-W. Yim,
          <string-name>
            <given-names>N.</given-names>
            <surname>Codella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Novoa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Malvehy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dimitrov</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. J. Das</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>H. M.</given-names>
          </string-name>
          <string-name>
            <surname>Shan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Nakov</surname>
            , I. Koychev,
            <given-names>S. A.</given-names>
          </string-name>
          <string-name>
            <surname>Hicks</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Gautam</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Thambawita</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Halvorsen</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Fabre</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Macaire</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Lecouteux</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Potthast</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Heinrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Kiesel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Anand</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Stein</surname>
          </string-name>
          , Overview of ImageCLEF 2025:
          <article-title>Multimedia Retrieval in Medical, Social Media and Content Recommendation Applications</article-title>
          , in: J.
          <string-name>
            <surname>C. de Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>García Seco de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. 16th International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>Iordanou</surname>
          </string-name>
          , C. Rapanta, “
          <article-title>Argue With Me”: A Method for Developing Argument Skills</article-title>
          , Frontiers in Psychology 12 (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .3389/fpsyg.
          <year>2021</year>
          .
          <volume>631203</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kuhn</surname>
          </string-name>
          , Science as Argument:
          <article-title>Implications for Teaching and Learning Scientific Thinking</article-title>
          ,
          <source>Science Education</source>
          <volume>77</volume>
          (
          <year>1993</year>
          )
          <fpage>319</fpage>
          -
          <lpage>337</lpage>
          . doi:
          <volume>10</volume>
          .1002/sce.3730770306.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Wambsganss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kueng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Soellner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Leimeister</surname>
          </string-name>
          ,
          <article-title>ArgueTutor: An Adaptive Dialog-Based Learning System for Argumentation Skills</article-title>
          ,
          <source>in: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, CHI '21</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          . doi:
          <volume>10</volume>
          .1145/3411764.3445781.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Slonim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bilu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Alzate</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bar-Haim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bogin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bonin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Choshen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Cohen-Karlik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Dankin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Edelstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ein-Dor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Friedman-Melamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gavron</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gleize</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gretz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gutfreund</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Halfon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hershcovich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Hoory</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hummel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jacovi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jochim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kantor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Katz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Konopnicki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Kons</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kotlerman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Krieger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lahav</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lavee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Liberman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Mass</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Menczel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Mirkin</surname>
          </string-name>
          , G. Moshkowich,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ofek-Koifman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Orbach</surname>
          </string-name>
          , E. Rabinovich,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rinott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shechtman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sheinwald</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Shnarch</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Shnayderman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Spector</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sznajder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Toledo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Toledo-Ronen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Venezian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Aharonov</surname>
          </string-name>
          , An Autonomous Debating System,
          <source>Nature</source>
          <volume>591</volume>
          (
          <year>2021</year>
          )
          <fpage>379</fpage>
          -
          <lpage>384</lpage>
          . doi:
          <volume>10</volume>
          .1038/s41586-021-03215-w.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kiesel</surname>
          </string-name>
          , Ç. Çöltekin,
          <string-name>
            <given-names>M.</given-names>
            <surname>Heinrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Alshomary</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. D.</given-names>
            <surname>Longueville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Erjavec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Handke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kopp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ljubešić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Meden</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mirzakhmedova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Morkevičius</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Reitis-Münstermann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Scharfbillig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Stefanovitch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wachsmuth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          , Overview of Touché 2024:
          <article-title>Argumentation Systems</article-title>
          , in: L.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Mulhem</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Quénot</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Schwab</surname>
          </string-name>
          ,
          <string-name>
            <surname>G. M. D. Nunzio</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Soulier</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Galuscakova</surname>
            ,
            <given-names>A. G. S.</given-names>
          </string-name>
          <string-name>
            <surname>Herrera</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. 15th International Conference of the CLEF Association (CLEF</source>
          <year>2024</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Arian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shamir</surname>
          </string-name>
          ,
          <article-title>The primarily political functions of the left-right continuum</article-title>
          ,
          <source>Comparative politics 15</source>
          (
          <year>1983</year>
          )
          <fpage>139</fpage>
          -
          <lpage>158</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>F.</given-names>
            <surname>Vegetti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Širinić</surname>
          </string-name>
          ,
          <article-title>Left-right Categorization and Perceptions of Party Ideologies</article-title>
          ,
          <source>Political Behavior</source>
          <volume>41</volume>
          (
          <year>2019</year>
          )
          <fpage>257</fpage>
          -
          <lpage>280</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>T. van Dijk</surname>
          </string-name>
          ,
          <source>Discourse and Power</source>
          , Bloomsbury Publishing,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>N.</given-names>
            <surname>Fairclough</surname>
          </string-name>
          ,
          <source>Critical Discourse Analysis: The Critical Study of Language</source>
          , Longman applied linguistics, Taylor &amp; Francis,
          <year>2013</year>
          . doi:
          <volume>10</volume>
          .4324/9781315834368.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>N.</given-names>
            <surname>Fairclough</surname>
          </string-name>
          , Language and Power, Language In Social Life, Taylor &amp; Francis,
          <year>2013</year>
          . doi:
          <volume>10</volume>
          .4324/ 9781315838250.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>M. D. Conover</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Gonçalves</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Ratkiewicz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Flammini</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Menczer</surname>
          </string-name>
          ,
          <article-title>Predicting the political alignment of Twitter users</article-title>
          ,
          <source>in: Proc. of PASSAT and SocialCom</source>
          , IEEE,
          <year>2011</year>
          , pp.
          <fpage>192</fpage>
          -
          <lpage>199</lpage>
          . doi:
          <volume>10</volume>
          . 1109/PASSAT/SocialCom.
          <year>2011</year>
          .
          <volume>34</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gerrish</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Blei</surname>
          </string-name>
          ,
          <article-title>Predicting Legislative Roll Calls from Text</article-title>
          , in: L.
          <string-name>
            <surname>Getoor</surname>
          </string-name>
          , T. Schefer (Eds.),
          <source>Proc. of ICML, Omnipress</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>489</fpage>
          -
          <lpage>496</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Preoţiuc-Pietro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hopkins</surname>
          </string-name>
          , L. Ungar, Beyond Binary Labels:
          <article-title>Political Ideology Prediction of Twitter Users</article-title>
          , in: R. Barzilay, M.-Y. Kan (Eds.),
          <source>Proc. of ACL</source>
          , ACL,
          <year>2017</year>
          , pp.
          <fpage>729</fpage>
          -
          <lpage>740</lpage>
          . doi:
          <volume>10</volume>
          . 18653/v1/
          <fpage>P17</fpage>
          -1068.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.-F.</given-names>
            <surname>Hurtado</surname>
          </string-name>
          ,
          <article-title>Political Tendency Identification in Twitter using Sentiment Analysis Techniques</article-title>
          , in: J.
          <string-name>
            <surname>Tsujii</surname>
          </string-name>
          , J. Hajic (Eds.),
          <source>Proc. of Coling</source>
          , Dublin City University and ACL,
          <year>2014</year>
          , pp.
          <fpage>183</fpage>
          -
          <lpage>192</lpage>
          . URL: https://aclanthology.org/C14-1019.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Walker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Saligrama</surname>
          </string-name>
          ,
          <article-title>Ideology Prediction from Scarce and Biased Supervision: Learn to Disregard the “What” and Focus on the “How”!</article-title>
          , in: A.
          <string-name>
            <surname>Rogers</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Boyd-Graber</surname>
          </string-name>
          , N. Okazaki (Eds.),
          <source>Proc. of ACL</source>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          , ACL, Toronto, Canada,
          <year>2023</year>
          , pp.
          <fpage>9529</fpage>
          -
          <lpage>9549</lpage>
          . doi:
          <volume>10</volume>
          . 18653/v1/
          <year>2023</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>530</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>García-Díaz</surname>
          </string-name>
          , et al.,
          <source>Overview of PoliticES</source>
          <year>2022</year>
          :
          <article-title>Spanish Author Profiling for Political Ideology</article-title>
          ,
          <source>Procesamiento del Lenguaje Natural</source>
          <volume>69</volume>
          (
          <year>2022</year>
          )
          <fpage>265</fpage>
          -
          <lpage>272</lpage>
          . doi:
          <volume>10</volume>
          .26342/2022-69-23.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>D.</given-names>
            <surname>Russo</surname>
          </string-name>
          , et al.,
          <source>PoliticIT at EVALITA</source>
          <year>2023</year>
          :
          <article-title>Overview of the political ideology detection in Italian texts task</article-title>
          ,
          <source>in: Proc. of EVALITA</source>
          , volume
          <volume>3473</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2023</year>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3473</volume>
          /paper7.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>G. M. Kurtoğlu</surname>
            <given-names>Eskişar</given-names>
          </string-name>
          , Ç. Çöltekin,
          <article-title>Emotions Running High? A Synopsis of the state of Turkish Politics through the ParlaMint Corpus</article-title>
          , in: D.
          <string-name>
            <surname>Fišer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Eskevich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Lenardič</surname>
          </string-name>
          , F. de Jong (Eds.),
          <source>Proc. of ParlaCLARIN</source>
          , ELRA,
          <year>2022</year>
          , pp.
          <fpage>61</fpage>
          -
          <lpage>70</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .parlaclarin-
          <volume>1</volume>
          .
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Mochtak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rupnik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ljubešić</surname>
          </string-name>
          ,
          <article-title>The ParlaSent Multilingual Training Dataset for Sentiment Identification in Parliamentary Proceedings</article-title>
          , in: N.
          <string-name>
            <surname>Calzolari</surname>
            , M.-
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Kan</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Hoste</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Lenci</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Sakti</surname>
          </string-name>
          , N. Xue (Eds.),
          <source>Proc. of LREC, ELRA and ICCL</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>16024</fpage>
          -
          <lpage>16036</lpage>
          . URL: https://aclanthology. org/
          <year>2024</year>
          .lrec-main.
          <volume>1393</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>O.</given-names>
            <surname>Tarkka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Koljonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Korhonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Laine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Martiskainen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Elo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Laippala</surname>
          </string-name>
          ,
          <source>Automated Emotion Annotation of Finnish Parliamentary Speeches Using GPT-4</source>
          , in: D.
          <string-name>
            <surname>Fiser</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Eskevich</surname>
          </string-name>
          , D. Bordon (Eds.),
          <source>Proc. of ParlaCLARIN, ELRA and ICCL</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>70</fpage>
          -
          <lpage>76</lpage>
          . URL: https://aclanthology. org/
          <year>2024</year>
          .parlaclarin-
          <volume>1</volume>
          .
          <fpage>11</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>C.</given-names>
            <surname>Navarretta</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Haltrup Hansen, Government and opposition in Danish parliamentary debates</article-title>
          , in: D.
          <string-name>
            <surname>Fiser</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Eskevich</surname>
          </string-name>
          , D. Bordon (Eds.),
          <source>Proc. of ParlaCLARIN, ELRA and ICCL</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>154</fpage>
          -
          <lpage>162</lpage>
          . URL: https://aclanthology.org/
          <year>2024</year>
          .parlaclarin-
          <volume>1</volume>
          .
          <fpage>23</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>K. A.</given-names>
            <surname>Hawkins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Carlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Littvay</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. R.</given-names>
            <surname>Kaltwasser</surname>
          </string-name>
          (Eds.), The Ideational Approach to Populism: Concept, Theory, and
          <string-name>
            <surname>Analysis</surname>
          </string-name>
          ,
          <source>Extremism and Democracy</source>
          , Routledge,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>P.</given-names>
            <surname>Norris</surname>
          </string-name>
          , Measuring populism worldwide,
          <source>Party politics 26</source>
          (
          <year>2020</year>
          )
          <fpage>697</fpage>
          -
          <lpage>717</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rooduijn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. L. P.</given-names>
            <surname>Pirro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Halikiopoulou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Froio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Van</given-names>
            <surname>Kessel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. L.</given-names>
            <surname>De Lange</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mudde</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Taggart</surname>
          </string-name>
          ,
          <article-title>The PopuList: A Database of Populist, Far-Left, and Far-Right Parties Using ExpertInformed Qualitative Comparative Classification (EiQCC)</article-title>
          ,
          <source>British Journal of Political Science</source>
          <volume>54</volume>
          (
          <year>2024</year>
          )
          <fpage>969</fpage>
          -
          <lpage>978</lpage>
          . doi:
          <volume>10</volume>
          .1017/S0007123423000431.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dutilh Novaes</surname>
          </string-name>
          , Argument and Argumentation, in: E. N.
          <string-name>
            <surname>Zalta</surname>
          </string-name>
          , U. Nodelman (Eds.),
          <source>The Stanford Encyclopedia of Philosophy</source>
          , Fall 2022 ed., Metaphysics Research Lab, Stanford University,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewiński</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mohammed</surname>
          </string-name>
          , Argumentation Theory, in: K. B.
          <string-name>
            <surname>Jensen</surname>
            ,
            <given-names>R. T.</given-names>
          </string-name>
          <string-name>
            <surname>Craig</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Pooley</surname>
          </string-name>
          , E. W. Rothenbuhler (Eds.),
          <source>The International Encyclopedia of Communication Theory and Philosophy</source>
          , Wiley, Hoboken, NJ,
          <year>2016</year>
          . doi:
          <volume>10</volume>
          .1002/9781118766804.wbiect198.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>L.</given-names>
            <surname>Groarke</surname>
          </string-name>
          , Informal Logic, in: E. N.
          <string-name>
            <surname>Zalta</surname>
          </string-name>
          , U. Nodelman (Eds.),
          <source>The Stanford Encyclopedia of Philosophy</source>
          , Spring 2024 ed., Metaphysics Research Lab, Stanford University,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>M.</given-names>
            <surname>Champagne</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.-V.</given-names>
            <surname>Pietarinen</surname>
          </string-name>
          ,
          <article-title>Why images cannot be arguments, but moving ones might</article-title>
          ,
          <source>Argumentation</source>
          <volume>34</volume>
          (
          <year>2020</year>
          )
          <fpage>207</fpage>
          -
          <lpage>236</lpage>
          . doi:
          <volume>10</volume>
          .1007/s10503-019-09484-0.
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>F.</given-names>
            <surname>Dunaway</surname>
          </string-name>
          , Images, Emotions, Politics,
          <source>Modern American History</source>
          <volume>1</volume>
          (
          <year>2018</year>
          )
          <fpage>369</fpage>
          -
          <lpage>376</lpage>
          . doi:
          <volume>10</volume>
          . 1017/mah.
          <year>2018</year>
          .
          <volume>17</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Kjeldsen</surname>
          </string-name>
          ,
          <article-title>The Rhetoric of Thick Representation: How Pictures Render the Importance and Strength of an Argument Salient</article-title>
          ,
          <source>Argumentation</source>
          <volume>29</volume>
          (
          <year>2015</year>
          )
          <fpage>197</fpage>
          -
          <lpage>215</lpage>
          . doi:
          <volume>10</volume>
          .1007/ s10503-014-9342-2.
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>D.</given-names>
            <surname>Fleming</surname>
          </string-name>
          , Can pictures be arguments?,
          <source>Argumentation and Advocacy</source>
          <volume>33</volume>
          (
          <year>1996</year>
          )
          <fpage>11</fpage>
          -
          <lpage>22</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>I. J.</given-names>
            <surname>Dove</surname>
          </string-name>
          ,
          <article-title>On Images as Evidence and Arguments</article-title>
          , in: F. H. van Eemeren,
          <string-name>
            <surname>B.</surname>
          </string-name>
          Garssen (Eds.),
          <source>Topical Themes in Argumentation Theory: Twenty Exploratory Studies, Argumentation Library</source>
          , Springer Netherlands, Dordrecht,
          <year>2012</year>
          , pp.
          <fpage>223</fpage>
          -
          <lpage>238</lpage>
          . doi:
          <volume>10</volume>
          .1007/
          <fpage>978</fpage>
          -94-007-4041-9_
          <fpage>15</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>I. Grancea</surname>
          </string-name>
          ,
          <article-title>Types of Visual Arguments, Argumentum</article-title>
          .
          <source>Journal of the Seminar of Discursive Logic</source>
          ,
          <source>Argumentation Theory and Rhetoric</source>
          <volume>15</volume>
          (
          <year>2017</year>
          )
          <fpage>16</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>D.</given-names>
            <surname>Dimitrov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Bin</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Alam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Silvestri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Firooz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Nakov</surname>
          </string-name>
          , G. Da San Martino, SemEval
          <article-title>-2021 Task 6: Detection of Persuasion Techniques in Texts and Images</article-title>
          ,
          <source>in: Proc. of SemEval</source>
          , ACL,
          <year>2021</year>
          , pp.
          <fpage>70</fpage>
          -
          <lpage>98</lpage>
          . doi:
          <volume>10</volume>
          .18653/v1/
          <year>2021</year>
          .semeval-
          <volume>1</volume>
          .7.
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <article-title>Composition and Deformance: Measuring Imageability with a Text-to-</article-title>
          <string-name>
            <surname>Image</surname>
            <given-names>Model</given-names>
          </string-name>
          ,
          <source>CoRR abs/2306</source>
          .03168 (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .48550/ARXIV.2306.03168. arXiv:
          <volume>2306</volume>
          .
          <fpage>03168</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>M.</given-names>
            <surname>Brysbaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. B.</given-names>
            <surname>Warriner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kuperman</surname>
          </string-name>
          ,
          <article-title>Concreteness ratings for 40 thousand generally known english word lemmas</article-title>
          ,
          <source>Behavior Research Methods</source>
          <volume>46</volume>
          (
          <year>2014</year>
          )
          <fpage>904</fpage>
          -
          <lpage>911</lpage>
          . doi:
          <volume>10</volume>
          .3758/ s13428-013-0403-5.
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>H.</given-names>
            <surname>Wachsmuth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Naderi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bilu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Prabhakaran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. A.</given-names>
            <surname>Thijm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Hirst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <article-title>Computational Argumentation Quality Assessment in Natural Language</article-title>
          ,
          <source>in: Proceedings of EACL</source>
          <year>2017</year>
          ,
          <year>2017</year>
          , pp.
          <fpage>176</fpage>
          -
          <lpage>187</lpage>
          . URL: https://aclanthology.org/E17-1017/.
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Spatharioti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. M.</given-names>
            <surname>Rothschild</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Goldstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Hofman</surname>
          </string-name>
          ,
          <article-title>Comparing Traditional and LLM-based Search for Consumer Choice: A Randomized Experiment</article-title>
          ,
          <source>CoRR abs/2307</source>
          .03744 (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .48550/ARXIV.2307.03744.
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <given-names>I.</given-names>
            <surname>Zelch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <article-title>A User Study on the Acceptance of Native Advertising in Generative IR</article-title>
          ,
          <source>in: ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR</source>
          <year>2024</year>
          ), ACM,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .1145/3627508.3638316.
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          , L. Liu,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lv</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Shao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>You</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Sang, CTR-Driven Advertising Image Generation with Multimodal Large Language Models</article-title>
          ,
          <source>in: Proceedings of the ACM Web Conference</source>
          <year>2025</year>
          , WWW '25,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2025</year>
          , p.
          <fpage>2262</fpage>
          -
          <lpage>2275</lpage>
          . doi:
          <volume>10</volume>
          .1145/3696410.3714836.
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          [43]
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Qu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y. Wei,</surname>
          </string-name>
          <article-title>AdGPT: Explore Meaningful Advertising with ChatGPT</article-title>
          ,
          <source>ACM Trans. Multimedia Comput. Commun. Appl</source>
          .
          <volume>21</volume>
          (
          <year>2025</year>
          ). doi:
          <volume>10</volume>
          .1145/3720546.
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          [44]
          <string-name>
            <given-names>S.</given-names>
            <surname>Feizi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hajiaghayi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rezaei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shin</surname>
          </string-name>
          ,
          <article-title>Online Advertisements with LLMs: Opportunities</article-title>
          and Challenges,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2311.07601. arXiv:
          <volume>2311</volume>
          .
          <fpage>07601</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          [45]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hajiaghayi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lahaie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rezaei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shin</surname>
          </string-name>
          , Ad Auctions for LLMs via Retrieval Augmented Generation,
          <year>2024</year>
          . URL: http://papers.nips.cc/paper_files/paper/2024/hash/ 20dcab0f14046a5c6b02b61da9f13229-Abstract-Conference.html.
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          [46]
          <string-name>
            <given-names>S.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Zelch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bevendorf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          , Detecting Generated Native Ads in Conversational Search, in:
          <source>Companion Proceedings of the ACM Web Conference</source>
          <year>2024</year>
          , WWW '24,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2024</year>
          , p.
          <fpage>722</fpage>
          -
          <lpage>725</lpage>
          . doi:
          <volume>10</volume>
          . 1145/3589335.3651489.
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          [47]
          <string-name>
            <given-names>E. E.</given-names>
            <surname>Schauster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ferrucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Neill</surname>
          </string-name>
          ,
          <article-title>Native Advertising is the New Journalism: How Deception Afects Social Responsibility</article-title>
          ,
          <source>American Behavioral Scientist</source>
          <volume>60</volume>
          (
          <year>2016</year>
          )
          <fpage>1408</fpage>
          -
          <lpage>1424</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          [48]
          <string-name>
            <given-names>B. W.</given-names>
            <surname>Wojdynski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. J.</given-names>
            <surname>Evans</surname>
          </string-name>
          , Going Native:
          <article-title>Efects of Disclosure Position and Language onthe Recognition and Evaluation of Online Native Advertising</article-title>
          ,
          <source>Journal of Advertising</source>
          <volume>45</volume>
          (
          <year>2016</year>
          )
          <fpage>157</fpage>
          -
          <lpage>168</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          [49]
          <string-name>
            <given-names>C.</given-names>
            <surname>Campbell</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. E. Grimm,</surname>
          </string-name>
          <article-title>The challenges native advertising poses: Exploring potential federal trade commission responses and identifying research needs</article-title>
          ,
          <source>Journal of Public Policy &amp; Marketing</source>
          <volume>38</volume>
          (
          <year>2019</year>
          )
          <fpage>110</fpage>
          -
          <lpage>123</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          [50]
          <string-name>
            <given-names>B.</given-names>
            <surname>Eyada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Milla</surname>
          </string-name>
          , Native Advertising:
          <article-title>Challenges and Perspectives</article-title>
          ,
          <source>Journal of Design Sciences and Applied Arts</source>
          <volume>1</volume>
          (
          <year>2020</year>
          )
          <fpage>67</fpage>
          -
          <lpage>77</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          [51]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wiegmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Kolyada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Grahm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Elstner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Loebe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Stein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Potthast</surname>
          </string-name>
          ,
          <article-title>Continuous Integration for Reproducible Shared Tasks with TIRA.io</article-title>
          , in: J.
          <string-name>
            <surname>Kamps</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Crestani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maistro</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Joho</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Davis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gurrin</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          <string-name>
            <surname>Kruschwitz</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Caputo (Eds.),
          <source>Advances in Information Retrieval. 45th European Conference on IR Research (ECIR</source>
          <year>2023</year>
          ), Lecture Notes in Computer Science, Springer, Berlin Heidelberg New York,
          <year>2023</year>
          , pp.
          <fpage>236</fpage>
          -
          <lpage>241</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>031</fpage>
          -28241-6_
          <fpage>20</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          [52]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fröbe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Merker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Scells</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hagen</surname>
          </string-name>
          , M. Potthast, TIREx Tracker:
          <article-title>The Information Retrieval Experiment Tracker</article-title>
          , in: 48th
          <source>International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR</source>
          <year>2025</year>
          ), ACM,
          <year>2025</year>
          . doi:
          <volume>10</volume>
          .1145/3726302.3730297.
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          [53]
          <string-name>
            <given-names>T.</given-names>
            <surname>Breuer</surname>
          </string-name>
          , J. Keller, P. Schaer,
          <article-title>ir_metadata: An extensible metadata schema for IR experiments</article-title>
          , in: E. Amigó,
          <string-name>
            <given-names>P.</given-names>
            <surname>Castells</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Carterette</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. S.</given-names>
            <surname>Culpepper</surname>
          </string-name>
          , G. Kazai (Eds.),
          <source>SIGIR '22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , Madrid, Spain,
          <source>July 11 - 15</source>
          ,
          <year>2022</year>
          , ACM,
          <year>2022</year>
          , pp.
          <fpage>3078</fpage>
          -
          <lpage>3089</lpage>
          . doi:
          <volume>10</volume>
          .1145/3477495.3531738.
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          [54]
          <string-name>
            <given-names>H.</given-names>
            <surname>Grice</surname>
          </string-name>
          ,
          <article-title>Studies in the Way of Words</article-title>
          , William James lectures, Harvard University Press,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          [55]
          <string-name>
            <given-names>G.</given-names>
            <surname>Skitalinskaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Klaf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wachsmuth</surname>
          </string-name>
          , Learning From Revisions:
          <article-title>Quality Assessment of Claims in Argumentation at Scale</article-title>
          , in: P. Merlo,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tiedemann</surname>
          </string-name>
          , R. Tsarfaty (Eds.),
          <source>Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics:</source>
          Main Volume,
          <source>EACL</source>
          <year>2021</year>
          , Online,
          <source>April 19 - 23</source>
          ,
          <year>2021</year>
          , Association for Computational Linguistics,
          <year>2021</year>
          , pp.
          <fpage>1718</fpage>
          -
          <lpage>1729</lpage>
          . doi:
          <volume>10</volume>
          .18653/V1/
          <year>2021</year>
          .EACL-MAIN.
          <year>147</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          [56]
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Jasper and Stella: Distillation of SOTA Embedding Models</article-title>
          ,
          <source>CoRR abs/2412</source>
          .19048 (
          <year>2024</year>
          ). doi:
          <volume>10</volume>
          .48550/ARXIV.2412.19048. arXiv:
          <volume>2412</volume>
          .
          <fpage>19048</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref57">
        <mixed-citation>
          [57]
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Vallecillo-Rodríguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Martín-Valdivia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Montejo-Ráez</surname>
          </string-name>
          , SINAI at Touché:
          <article-title>Leveraging Guided Prompt Strategies for Retrieval-Augmented Debate</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.), Working Notes of CLEF 2025 -
          <article-title>Conference and Labs of the Evaluation Forum,</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>