<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Mining and Knowledge Discovery</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1023/A:1014043630878</article-id>
      <title-group>
        <article-title>Team GPLSI at QClef 2025: Quantum-Inspired Instance Selection and Clustering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Juan Pablo Consuegra-Ayala</string-name>
          <email>juan.consuegra@ua.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aitana Morote-Martínez</string-name>
          <email>aitana.morote@ua.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francisco Valero-Abellón</string-name>
          <email>francisco.valero@ua.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elena Lloret</string-name>
          <email>elloret@dlsi.ua.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paloma Moreda</string-name>
          <email>moreda@dlsi.ua.es</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manuel Palomar</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Language and Computing Systems, University of Alicante</institution>
          ,
          <addr-line>Alicante, Spain, 03690</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Digital Intelligence Centre (CENID), University of Alicante</institution>
          ,
          <addr-line>Alicante, Spain, 03690</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Natural Language Processing and Information Systems Group (GPLSI), University of Alicante</institution>
          ,
          <addr-line>Alicante, Spain, 03690</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University Institute for Computing Research (IUII), University of Alicante</institution>
          ,
          <addr-line>Alicante, Spain, 03690</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <volume>6</volume>
      <issue>2002</issue>
      <fpage>153</fpage>
      <lpage>172</lpage>
      <abstract>
        <p>This paper presents the participation of the GPLSI team in the Quantum CLEF (QClef) Lab at CLEF 2025, focusing on Task 2 - Instance Selection and Task 3 - Clustering. The QClef Lab explores the applicability of quantum and quantum-inspired techniques to core AI tasks, emphasizing optimization eficiency and data reduction. In Task 2, we propose three multi-paradigm approaches for selecting representative training instances for sentiment classification, leveraging sentiment-aware pairing, local set-based criteria, and classical heuristics. In Task 3, we introduce a single quantum-inspired clustering framework that integrates four distinct pivot selection strategies for document grouping in embedding space. Our methods achieved competitive performance across both tasks. In particular, our LocalSets method achieved the highest efectiveness in Task 2 while substantially reducing the training set, and our FPS-Medoids approach delivered the best results for Task 3 in terms of nDCG@10. Overall, our ifndings support the potential of annealing-based techniques to deliver efective trade-ofs between performance and computational eficiency in realistic machine learning pipelines.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Quantum Computing</kwd>
        <kwd>Quantum Annealing</kwd>
        <kwd>Quantum NLP</kwd>
        <kwd>CLEF</kwd>
        <kwd>Instance Selection</kwd>
        <kwd>Clustering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The Quantum CLEF (QClef) Lab at CLEF 2025 [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] explores the integration of quantum computing
technologies into real-world information retrieval and machine learning workflows. The lab serves
as a benchmark initiative to assess how quantum-inspired and quantum-native methods can enhance
computational eficiency and efectiveness across core AI tasks. By leveraging both quantum and
simulated annealers, QClef aims to investigate the suitability of these technologies for structured and
unstructured data analysis.
      </p>
      <p>QClef 2025 is organized into three primary tasks: Task 1 - Feature Selection, Task 2 - Instance Selection,
and Task 3 - Clustering. Each task presents a diferent machine learning challenge, ofering opportunities
to apply quantum approaches in data preprocessing, optimization, and representation learning.</p>
      <p>Our team, GPLSI, participated in Task 2 and Task 3. The main hypothesis guiding our work is that
quantum-inspired optimization methods can lead to more representative and compact data subsets,
thereby improving downstream learning and analysis tasks such as classification and clustering. In
particular, we explore the utility of Quadratic Unconstrained Binary Optimization (QUBO) formulations
and hybrid clustering strategies to address instance selection and structure discovery in sentiment
analysis datasets.</p>
      <p>Our specific objectives are:
• To develop and compare three instance selection approaches for Task 2.
• To design a quantum-inspired clustering method for Task 3.</p>
      <p>• To evaluate all methods using the oficial datasets and protocols provided by QClef 2025.</p>
      <p>The remainder of this report is structured as follows. Sections 2 and 3 detail our methodology for
instance selection and clustering, respectively. Section 4 presents the experimental setup and discusses
results and observations. Finally, Section 5 concludes with a summary and future directions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methodology —Task 2: Instance Selection</title>
      <p>This section outlines the methodology employed for Task 2, which focuses on instance selection (IS)
aimed at improving the eficiency and performance of downstream sentiment analysis models. We
explore three distinct approaches, each designed to identify representative and informative subsets of
data. Section 2.1 introduces a pairing strategy based on sentiment polarity and semantic similarity to
capture nuanced contrasts. Section 2.2 details a technique that leverages local neighborhood bounded
by the local set-concept for instance selection. Finally, Section 2.3 presents a hybrid approach combining
heuristic pre-filtering with statistical optimization. Together, these methodologies ofer complementary
perspectives on selecting high-quality training instances.</p>
      <sec id="sec-2-1">
        <title>2.1. Approach I —Sentiment Pairs</title>
        <p>This approach focuses on selecting a balanced and representative subset of sentiment-labeled instances
by leveraging clustering and quantum-inspired optimization. The goal is to construct a reduced dataset
that maintains semantic diversity and class balance. The method consists of three main stages: (I)
balancing through clustering, (II) pair construction and selection, and (III) subset optimization via
QUBO.</p>
        <sec id="sec-2-1-1">
          <title>2.1.1. Balancing via K-Medoids Clustering</title>
          <p>
            To ensure a balanced set of instances from both sentiment classes, we first independently apply
KMedoids clustering [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] to the positive and negative subsets of the dataset. K-Medoids is chosen for its
robustness in identifying representative instances (medoids), which are actual data points that minimize
within-cluster dissimilarity.
          </p>
          <p>Let + and − be the sets of positive and negative instances, respectively. We apply K-Medoids to
each set to obtain:</p>
          <p>ℳ+ = KMedoids(+, ), ℳ− = KMedoids(− , )
where  is the number of desired medoids per class. The value of  controls the number of selected
instances from each class and ensures that |ℳ+| = |ℳ− |, enforcing class balance.</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>2.1.2. Pair Construction and Selection</title>
          <p>After balancing the dataset through medoid extraction (Section 2.1.1), we construct cross-class pairs by
combining positive and negative medoids in a greedy manner. Each positive medoid is paired with its
closest unpaired negative medoid based on cosine similarity in the feature space. This process ensures
that each negative medoid is used at most once, and the resulting sequence of pairs reflects a decreasing
order of similarity —from the most semantically aligned pairs to less similar ones.</p>
          <p>Formally, let:
• ℳ+ = {1+, . . . , + } be the set of positive medoids,
• ℳ− = {−1 , . . . , − } be the set of negative medoids,
• sim(+, − ) denote the cosine similarity between any two medoids.</p>
          <p>We initialize an empty set  = ∅. For each + ∈ ℳ+, we select:
− = arg</p>
          <p>max
− ∈ℳ− ∖
sim(+, − )
where  is the set of already paired negative medoids. We then add the pair (+, − ) to  and update
 ←  ∪ {− }.</p>
          <p>This greedy construction results in  semantically aligned positive-negative pairs, each with an
associated similarity score.</p>
          <p>Pair Selection Strategies To reduce the number of candidate pairs while maintaining semantic and
similarity diversity, we apply an agglomerative clustering approach that integrates two complementary
views of pair dissimilarity: (1) the diference in intra-pair similarity scores and (2) the semantic distance
between mean document vectors of the pairs.</p>
          <p>Let each pair  = (+, − ) ∈  consist of a positive and negative medoid, along with their cosine
similarity score . We define a combined distance measure between any two pairs  and  as follows:
1. Similarity Score Distance. The first component quantifies the absolute diference in intra-pair
similarity:</p>
          <p>sim(,  ) = | −  |
2. Semantic Mean Vector Distance Each pair is represented by the mean of its two constituent
document vectors:</p>
          <p>1
  = 2 (+ + − )</p>
          <p>The semantic distance between two pairs is then computed using cosine distance:
3. Combined Distance Metric The final distance between pairs is given by the average of the two
components:
vec(,  ) = 1 − cos( ,   )</p>
          <p>
            Using this combined pairwise distance matrix, we apply Agglomerative Clustering [
            <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
            ] with average
linkage and a precomputed distance metric. This clustering groups pairs that are both semantically
similar and close in intra-pair similarity score. From each resulting cluster, a single representative pair
is selected. The representative is defined as the most central pair within its cluster, i.e., the one with the
minimum total distance to all other members in the cluster:
* = arg
          </p>
          <p>min
∈cluster ∈cluster
∑︁
(,  )</p>
          <p>This yields a final set ′ ⊆  of  representative and diverse pairs that form the candidate pool
for the subsequent QUBO-based optimization (Section 2.1.3). By integrating both intra-pair and
interpair semantics, this selection process ensures that the reduced dataset maintains coverage over the
underlying structure of the sentiment space.</p>
        </sec>
        <sec id="sec-2-1-3">
          <title>2.1.3. QUBO Formulation for Subset Selection</title>
          <p>
            Once a reduced set of candidate cross-class pairs ′ is obtained (Section 2.1.2), we aim to select a
representative subset that is both informative and non-redundant. While auxiliary procedures such
as K-Medoids clustering for balancing, greedy algorithms for pair construction, and agglomerative
clustering for pair selection operate within polynomial time complexity, the overall problem of instance
selection often involves solving a combinatorial core that is inherently hard. In our case, this core is
formulated as a Quadratic Unconstrained Binary Optimization (QUBO) problem [
            <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
            ], which is NP-hard.
Therefore, despite relying on eficient polynomial-time heuristics to pre-process or structure the data,
the final optimization step still represents a significant computational challenge. This justifies the use of
quantum or quantum-inspired techniques, such as quantum annealing, to solve the QUBO formulation
more efectively than classical brute-force or metaheuristic approaches, especially when scalability
and solution quality are critical. The QUBO formulation allows us to encode the trade-of between the
relevance of individual pairs and the redundancy among them, and solve the resulting combinatorial
problem using classical or quantum annealing techniques.
          </p>
          <p>Relevance Score The relevance (or importance) of each pair is quantified based on how far its
similarity score deviates from the average of the minimum and maximum similarity scores among all
selected pairs. This discourages the selection of semantically central pairs (i.e., neither too similar nor
too dissimilar):</p>
          <p>Let  denote the cosine similarity score of pair  ∈ ′, and let:
 =
min∈′  + max∈′</p>
          <p>2
relevance = | − |
Then, the relevance score for  is defined as:
This assigns higher importance to pairs that diverge from the central similarity mass, promoting
diversity in semantic strength.</p>
          <p>Redundancy Score To penalize semantic overlap among selected pairs, we define a redundancy score
that measures similarity between every pair of pairs. For two candidate pairs  and  , redundancy is
computed as the sum of cosine similarities across three views:
1. Between their positive document vectors (+).
2. Between their negative document vectors (− ).
3. Between their mean vectors ( ).</p>
          <p>The total redundancy between  and  is:</p>
          <p>redundancy, = cos(+, +) + cos(− , − ) + cos( ,   )</p>
          <p>This encourages diversity by penalizing the inclusion of multiple pairs that represent similar semantic
regions in the embedding space.</p>
          <p>QUBO Objective Finally, we define a QUBO objective function that combines relevance and
redundancy:</p>
          <p>min
x∈{0,1} =1
  
∑︁ − relevance ·  + ∑︁ ∑︁ redundancy, ·  + penalty</p>
          <p>=1 =1
where  ∈ {0, 1} indicates whether pair  is selected,  is the total number of candidate pairs, the
ifrst term rewards the inclusion of relevant pairs, and the second term penalizes redundant selections.
To restrict the number of selected pairs to ′, we add a soft constraint:  (∑︀  − ′)2. This term is
incorporated into the QUBO as an additional quadratic penalty.</p>
        </sec>
        <sec id="sec-2-1-4">
          <title>2.1.4. Submission Configuration Summary</title>
          <p>All submissions—labeled as SentimentPairs—are based on variations of key parameters used in the
instance selection and QUBO optimization pipeline. The two main configurations difer in how output
pairs are compiled—either from the final QUBO solution only ( just-final) or from QUBO-selected
pairs enriched with related instances (pair-related). In the latter case, related instances refer to
pairs that belong to the same cluster as the QUBO-selected pairs, based on the clustering step used to
reduce the candidate set from  to  (Section 2.1.2). Each submission uses a fixed number of annealing
reads (2000), enforces a balance limit  of 90% of the minority class size, sets the number of candidate
cross-class pairs  = 150, and restricts the QUBO-selected final subset to ′ = 0.5 ×  = 75 pairs.
Table 1 summarizes the explored configurations.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Approach II —Local Sets</title>
        <p>
          The main idea of this method is to exploit the concept of local set [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], a geometrical construct describing
the instance neighborhood, to select an optimal set of instances for training a classification model.
Our approach extends the Local Set Border Selector (LSBo) [9], a hybrid method composed by a noise
ifltering step and the condensation of the noise free database. It also benefits of a modified version
of a supervised clustering algorithm based on local sets, LS-clustering [10], to prepare the dataset for
the selection using the annealing technique. Throughout our method, distances between instances are
computed as euclidean distances.
        </p>
        <sec id="sec-2-2-1">
          <title>2.2.1. Instance Taxonomy</title>
          <p>Our approach relies on a categorization of instances based on their contribution to classification accuracy,
where each category can be quantified using a measure based on local sets.</p>
          <p>• Noise Instances Noise instances disagree in classification with their neighborhoods, thereby
obscuring the relationship between the features of an instance and its label [11]. Datasets like
Yelp Reviews [12] and Vader NYT [13], where labels originate from subjective human ratings, are
prone to such noise. Within the Local Set-Based Smoother (LSSm) framework [9], noise instances
are identified with a measure of harmfulness, tied to how many instances identify them as their
nearest enemy.
• Redundant/Useless Instances By definition, these instances do not significantly influence
classification. Redundancy arises when excessive same-class neighbors exist. LSSm characterizes
redundancy with the usefulness measure, based on the number of local sets an instance belongs
to.
• Border instances Instances near class borders can be identified within the LSBo framework as
instances having the lowest Local Set Cardinality (LSC) among the members of their local sets.
• Central/typical instances Typical instances, also called prototypes, are instances central enough
to represent a local region. They are determined within the scope of the Local Set-based Centroids
selector (LSCo) [9], which selects a set of centroids of clusters obtained with LS-clustering.</p>
          <p>In the instance selection literature, methods classified as edition methods are typically directed at
removing noise, while condensation methods aim at removing redundancy. Methods combining both,
such as our method, are called hybrid [14]. Generally, the selection criterion also fluctuates between
focusing on border instances or central instances, as both contribute positively to classification accuracy.
Finding an optimal criterion is subject to the dataset characteristics [10]. Our method is a combination
of both main criteria, as we retain both a set of border instances and a set of central instances.</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>2.2.2. Method Objective</title>
          <p>Our method aims to enhance the LSBo, which already achieves a good tradeof between accuracy and
reduction, supported with empirical evidence [9], by selecting an additional set of instances through a
high complexity step executed with quantum annealing.</p>
          <p>To maximize accuracy with this additional set, rather than prioritizing centrality, our strategy will be
to select the less-redundant set among the non-border instances. Our heuristic to detect redundancy is
based on the distribution of same-class instances in local regions delimited by borders. More concretely,
a pair of same-class instances (or clusters, as will be explained hereafter) will be awarded or penalized
for selection by comparing their inner distance to the sum of the distances to their relative nearest
border.</p>
          <p>The complete algorithm pseudocode can be found at Algorithm 1.</p>
          <p>In the following sections, the algorithm steps 3, 4 and 5 will be explained in detail. A more complete
presentation of the algorithms LSSm and LSBo of steps 1 and 2 can be found in the paper [9]. Section 2.2.3
details the method used for clustering, Section 2.2.4 presents our heuristic for preselecting clusters and
Section 2.2.5 explains how we obtain the weights of the QUBO matrix determining the final selection.</p>
        </sec>
        <sec id="sec-2-2-3">
          <title>2.2.3. Modified LS-Clustering</title>
          <p>Our strategy to reduce the problem size is to associate each decision variable to a cluster containing
same-class local regions of instances. To that end, non-border, non-noisy instances are clustered using
a modified version of the local set-based clustering [ 10]. The original algorithm selects local sets with a
high LSC, resulting in clusters containing wide same-class regions. We modify the clustering algorithm
by processing small local sets first, obtaining numerous clusters containing smaller patches of instances
of the same region. This will allow us to select the less redundant patch of each region.</p>
          <p>The algorithm pseudocode can be found at Algorithm 2.</p>
          <p>Keeping only the  clusters closest to borders
Removing redundant clusters with quantum annealing (QA)</p>
          <p>Adding selected cluster members
Algorithm 2 Modified Local Set-based Clustering
Require: Set of instances 
Require: Set of noise instances  and border instances</p>
          <p>Set of clusters  ← ∅
 ←  ∖ ( ∪ )
Compute the s (Local Sets) for each instance in 
Sort instances in ascending order of  (Local Set Cardinality)
added ← ∅
for each  ∈  do
if  ∈/ added then
added ← added ∪ {}
Create cluster  with medoid  and members () ∖ added
added ← added ∪ ()</p>
          <p>Add  to 
end if
end for
return</p>
        </sec>
        <sec id="sec-2-2-4">
          <title>2.2.4. Cluster Preselection</title>
          <p>Filter noise and border instances
Smaller local sets are processed first
Instances already added to a cluster
Due to the hardware constraints, the number of problem decision variables is restricted to 150, meaning
that the quantum annealer can handle at most 150 candidate clusters at once. To meet this constraint, we
adopt a heuristic that preselects the 150 clusters whose centroids are closest to their nearest same-class
border instance. The process is as follows:
1. For each cluster , compute its centroid   as the mean of its member instances.
  =
1 ∑︁ 
|| ∈
* = ∈m, in = ‖  − ‖2
2. Let  be the set of border instances obtained using the LSBo method, and  ∈ {0, 1} the binary
class of each cluster. For each cluster , find the distance * to the nearest same-class border
instance  ∈ .</p>
          <p>3. Select the 150 clusters with the smallest distances * to form the preselection set.</p>
        </sec>
        <sec id="sec-2-2-5">
          <title>2.2.5. Cluster Selection using Quantum Annealing</title>
          <p>The time complexities of the diferent algorithms described so far —LSBo, modified LS-clustering
and algorithm for computing nearest borders— are ( 2), ( ) and () respectively, where T
represents the set of initial instances, C represents the total number of clusters, B the number of border
instances and d the feature dimensionality. Therefore, running these algorithms before the step that
will be executed in the quantum annealer does not compromise eficiency.</p>
          <p>The problem of selecting the less-redundant set of clusters based on their location around border
instances is formulated using a QUBO model. Specifically, a pair of same-class clusters is rewarded if
the distance between their centroids is greater than the sum of their respective distances to the nearest
border instance, as such cases are likely to correspond to clusters belonging to distinct local regions
separated by borders. The QUBO formulation is described below:</p>
          <p>Let each preselected cluster be associated with:
• Centroid  ,
• Nearest border distance * ,
• Class label  ∈ {0, 1}.</p>
          <p>We define the inter-cluster distance and the separation margin   as:</p>
          <p>inter_dist = ‖  −   ‖,   = inter_dist − (* + * +  )
where  ≥ 0 is a tolerance parameter that increases the required separation between same-class clusters
to avoid being considered redundant. Larger values of  reflect stricter redundancy criteria. For both
datasets (Vader NYT and Yelp Reviews), we use a value of 0.2 for the parameter  .</p>
          <p>We define the QUBO matrix  ∈ R×  as:
⎧⎪self_bias
⎪⎪⎪ 1
⎪⎪⎪ · inter_dist + 
[, ] = ⎨ 1
⎪⎪−  · inter_dist + 
⎪
⎪
⎪
⎪
⎪⎩0
if  = 
if   &lt; 0 and  = 
if   ≥ 0 and  = 
if  ̸= 
with the following parameters (we use the same parameter values for both datasets):
•  = 1.0: penalty scaling factor for redundant same-class cluster pairs,
•  = 1.0: reward scaling factor for well-separated same-class cluster pairs,
• self_bias = -1.0: negative diagonal term that promotes sparsity in the selection.
•  = 10− 6: small constant added to avoid division by zero.</p>
          <p>To enforce class balance in the selected set of clusters, we add a penalty to the QUBO formulation:
 ←
 +  · ww⊤
where  = 1 − 2
where  regulates the strength of the penalty term. We use a value of  = 5.0 for both datasets. The
sum of the terms 1 − 2 for all clusters  will be minimized when there is the same number of clusters
of each class in the final solution. As the optimal solution is the solution that minimizes the expression
, adding this term with a high enough  allows us to promote class balance. After obtaining the final
cluster selection, all the selected cluster member instances are added to the submission set containing
the border instances.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Approach III —Pre-Selection and Statistical Tuning</title>
        <p>We propose PREST, Pre-Selection and Statistical Tuning, for QUBO Annealing a three-step pipeline
that begins with classical heuristics and ends with the exact same optimization task running on both
CPUs and quantum hardware. First, we build candidate panels greedily and wrap them into a single
QUBO [15]. Because the objective and hyper-parameters stay identical, the classical-versus-quantum
comparison is strictly like-for-like. The workflow comprises three stages:
1. Stage I – Representation &amp; Pre-filtering : Sentence-BERT embeddings are enriched with scalar
features. Nine classical heuristics generate candidate subsets for a grid of target sizes . A metric
bundle {silhouette, Davies–Bouldin, balance, diversity, coverage} [16] ranks the outputs.
2. Stage II – Combinatorial Optimization: each of the four best heuristics selected in Stage I is
cast as a QUBO whose coeficients encode diversity, class-frequency fairness, emotional contrast ,
and sentiment confidence . A SA sweep over  and QUBO weights keeps the two top-scoring
panels per heuristic.
3. Stage III – Quantum-Ready Aggregation: To comply with the ≤ 150-variable limit of current
quantum annealers, the retained subsets are clustered into super-nodes. We solve two QUBO
variants—unconstrained and cardinality-constrained—with both SA and QA in a D-Wave QPU,
yielding the final instance sets.</p>
        <sec id="sec-2-3-1">
          <title>2.3.1. Stage I — Representation and Pre-filtering</title>
          <p>Augmented sentence embeddings Each sentence is embedded with the all-mpnet-base-v2
Sentence-BERT model [17], producing a 768-dimensional vector. To capture additional information we
concatenate two task-specific scalars: emotional contrast () defined as the gap between the two
highest emotion logits, and sentiment confidence (), given by the maximum soft-max probability
produced by the sentiment classifier.</p>
          <p>Subset generators We generate nine deterministic subsets for each target size  ∈
{200, 400, . . . , 2000}, leveraging a range of complementary heuristics grounded in core-set theory,
graph analysis, and uncertainty modeling. Classical core-set methods—Max–Min, K-Means Centroids,
and Farthest-First—are commonly used for feature selection in Natural Language Processing (NLP) [18].
To exploit structural cues, we include Community selection via modularity maximization [19] and
Closeness Centrality [20], two graph-based sampling heuristics. The Density-Weighted heuristic
prioritizes low-density regions in the embedding space, enhancing representational diversity by sampling
from semantically sparse areas [21]. Emotional Conflict , a custom heuristic introduced in this work,
identifies semantically similar sentence pairs with opposing afective signals—highlighting emotionally
ambiguous cases relevant to afective representation [ 22]. Diversity Sampling maximizes coverage
by iteratively picking points farthest from the current subset, thereby enhancing generalization and
reducing feature redundancy [23]. Finally, Uncertainty-Based selection focuses on items near sentiment
boundaries, following Bayesian active-learning and margin-sampling principles [24].
Metric-Driven Filtering Each subset is evaluated with five criteria [ 16] —silhouette (),
Davies–Bouldin (), class balance (ℬ), diversity (), and coverage (). For every dataset we gather
the raw scores of all nine heuristics and all target sizes , compute the global mean   and standard
deviation   of each metric , and obtain -scores  = ( −  )/ . The normalized scores are
ifnally averaged into Ψ = 15 (︀  −  + ℬ +  + )︀ , and the four highest-ranked heuristics progress to
Stage II.</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>2.3.2. Stage II — Combinatorial Optimization</title>
          <p>After the Stage I filtering we retain Max–Min, K-Means Centroids, Emotional Conflict and
Uncertainty-Based . For each pool  = {1, . . . , } and target size  ∈ {1000, 1500, 2000, 2500} (a
reduced grid chosen for runtime reasons) we flip a binary switch x ∈ {0, 1} ( = 1 selects ). The
QUBO combines four signals:
• Diversity  , amplified when the two sentences come from diferent sentiment classes and
inverted when they coincide.
• Class-frequency fairness: diagonal bias  /, where  is the number of instances in class 
and  is the total number of instances, that favours minority classes.
• Emotional contrast  emo  on the diagonal;
• Sentiment confidence  sent︀( 1 − |  − 0.5|)︀ on the diagonal (prefer items whose softmax
probability is close to 0.5).</p>
          <p>The QUBO matrix  is
⎧  +  emo  +  sent︀( 1 − |  − 0.5|)︀ ,  = ,
⎪⎪⎪⎨</p>
          <p>Hyper-parameter sweep We explore  emo ∈ {0.5, 1.0, 2.0, 2.5} and  sent ∈ {0.25, 0.5, 1.0}, crossed
with the four  values, i.e. 4 × 3 × 4 = 48 QUBOs per heuristic.</p>
          <p>Optimization Each pair (,  ) defines a QUBO instance, solved with Neal’s SA sampler (100 reads,
default settings). Panels are re-scored with the same metric bundle and the two best configurations per
heuristic are retained for Stage III.</p>
        </sec>
        <sec id="sec-2-3-3">
          <title>2.3.3. Stage III — Quantum-ready Aggregation</title>
          <p>Only the subsets generated by the K-Means Centroids and Emotional Conflict heuristics outperformed
all others in Stage II across all datasets, so we focus on those two for the quantum run. Both methods
generate datasets that exceed the variable budget of current QPUs, so each is compressed into  = 150
super-nodes via -means and reformulate the problem as a second QUBO optimized with both classical
Simulated Annealing (SA) and Quantum Annealing (QA). A super-node  = ⟨z¯, ^, ¯, ¯⟩ stores
centroid, majority class, mean emotional-contrast and mean sentiment-confidence.</p>
          <p>To keep the search flexible, we run the QUBO in two versions: one gently caps the solution at roughly
half of the available super-nodes (cardinality-constrained), while the other drops that limit altogether
and lets the annealer decide for itself how many super-nodes are worth keeping.</p>
          <p>QUBO without cardinality constraint Let  = |{}| and x ∈ {0, 1} . For super-nodes ,  define
the class indicator  () = 1 if ^ = ^ and 0 otherwise, and the Euclidean distance  = ‖z¯ − z¯ ‖2.
Following our implementation build_qubo_qclef_no_cardinality(), the unconstrained QUBO
reads
⎧ ^ +  emo ¯ +  sen t︀( 1 − | ¯ − 0.5|)︀ ,  = ,
⎪⎪⎪⎨ 
Here ^ is the frequency of class ^ in the panel;  emo and  sent ≥ 0 weight emotional contrast and
sentiment confidence, respectively.</p>
          <p>QUBO with cardinality constraint [25] To steer the solution towards  = ⌈/2⌉ selected
supernodes we build a Binary Quadratic Model (BQM) from unc and add a combination penalty generated
with dimod.generators.combinations. Its strength is set automatically to the maximum energy
spread of the unconstrained BQM:
card = unc +  (︀  −
∑︁ ℓ)︀ 2,
ℓ</p>
          <p>= bqm.maximum_energy_delta().</p>
          <p>Optimization and evaluation Both QUBOs are solved independently with SA (same schedule as
Stage II) and QA (2 000 reads, default settings). The binary solutions are projected back to their original
sentences and rescored with the five-metric bundle.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology —Task 3: Clustering</title>
      <p>This section outlines the methodology employed for Task 3, which focuses on centroid selection aimed
at accelerating document retrieval over the ANTIQUE sentence-embedding corpus. We investigate four
complementary clustering pipelines, each designed to generate a small set of representative centroids
that preserve cluster quality and maximize downstream retrieval efectiveness.</p>
      <p>Our hybrid, QA-driven k-medoids pipeline unfolds in three steps:
1. Pivot selection: we start from the 6 513 sentence embeddings provided with the Antique corpus;
this full set of  ≈ 6500 vectors is filtered down to  = 150 well-spread candidate pivots.
2. Annealing optimization : cast a QUBO on those  pivots to identify the final  medoids.
3. Global assignment: assign every document to its nearest medoids, producing the final  clusters.
function
Following the standard QUBO framework [15] and Bauckhage’s quadratic reduction [26], we embed
the  pivots as rows of X ∈ R ×  (ℓ2-normalized). Let   = 1
−
 ∈ {0, 1}, with  = 1 if pivot  is selected. The k-medoids objective becomes the pseudo-Boolean
cos(x, x ) and introduce binary

︁( ∑︁  − 
︁) 2</p>
      <p>+

 &lt;
1 ∑︁     ,
which expands to the QUBO
∑︁    +  2,  =
≤ 
⎧
⎨ −
⎩ −
2   ,
2   +  ∑︀
=1   − 2,  = ,
 ̸= ,
where we fix  = 1/,  = 1/ , and  = 2, balancing dispersion, linear penalties, and enforcing
∑︀  = . The parameter  acts as a Lagrange multiplier, which is crucial in balancing the minimization
of the objective function against the constraint that the number of selected medoids is equal to . This
same parameter set is kept across all four methods (A–D) for both the exact- reduction and the final
exact- optimization. These values are chosen according to the guidelines provided by Bauckhage et al.
[26] to ensure a balanced contribution of the diferent terms in the objective function. The resulting
BinaryQuadraticModel is submitted to a D-Wave annealer (2 000 reads); the returned vector y*
lfags the selected medoids.</p>
      <p>This Bauckhage QUBO ofers several practical advantages: by specifying only its upper-triangular
entries, the matrix is symmetric ( = ), simplifying minor embedding on quantum hardware; the
diagonal shift − 2 cancels the linear-term bias from the squared cardinality penalty, avoiding skew
toward too many or too few pivots; the constant ofset 0 maintains equivalence with the original
objective yet can be dropped after solving without afecting cluster assignments; and, since   = 0 and
most of-diagonal   are small,  is naturally sparse, reducing the solver’s memory footprint.</p>
      <sec id="sec-3-1">
        <title>3.2. Pivot-Selection Strategies</title>
        <p>With the QUBO core fixed, we isolate the efect of classical preprocessing by testing four distinct
pivot-selection strategies that each deliver  = 150 candidates.
quantum stage.</p>
        <p>Method A: qIIMAS strategy (SubMedoids).</p>
        <p>Following the technique introduced by team qIIMAS in
last year’s competition [27], each document is assigned a binary variable  ∈ {0, 1} indicating whether
it remains a pivot. An “exact- ” QUBO with cost matrix ′ =   / forces the cardinality constraint
∑︀  =  while minimizing pairwise similarity within the chosen subset. Solving this model on the
same annealing backend used downstream yields a maximally diverse pivot set that captures even the
rarest semantic regions of the corpus. Any surplus indices are deterministically trimmed, whereas
deficits are filled with previously unused, randomly sampled documents to ensure
|| =  before the
Method B: Farthest-Point Sampling (FPS). FPS adopts a simple yet powerful greedy rule: starting
from one random seed, it maintains each document’s distance to its nearest selected pivot (measured in
cosine distance) and repeatedly adds the document with the largest recorded distance until exactly 
QUBO receives a well-spread candidate pool for the final k-medoids step.
pivots have been chosen [28]. Each iteration is a single pass over the corpus, yielding a cost of (  )
time and ( ) memory. The resulting pivots provide broad geometric coverage, so the subsequent
Method C: MiniBatch-KMeans.</p>
        <p>Running MiniBatch-KMeans [29] with  clusters, we pick as pivot
the document closest to each centroid   in cosine distance. These representatives reflect data density,
furnishing the annealer with high-quality starting points at a per-epoch cost of ( ).
Method D: CLARA–CLARANS hybrid. We combine two classic medoid heuristics in a single,
twostep routine. CLARA first draws several disjoint subsamples (each roughly 20 % of the corpus), applies
PAM to every subsample, and retains the  medoids that minimise the within-subset distance. Building
on this provisional set, CLARANS performs up to twenty random medoid–non-medoid swaps, accepting
a swap only when it reduces the overall clustering cost [30]. The combination lets CLARA explore
diverse regions of the embedding space, while CLARANS fine-tunes the most promising configuration,
yielding a pivot set that achieves both broad coverage and strong internal cohesion.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Experimental workflow common to all methods (A–D)</title>
        <p>For every pivot selector—qIIMAS strategy (A), FPS (B), MiniBatch-KMeans (C), and CLARA–CLARANS
(D)—and each target size  ∈ {10, 25, 50}, we run the six-step pipeline below:
1. Initial set-up. Start with the full corpus of  ℓ2-normalized sentence embeddings and fix .
2. Coarse compression. Apply the chosen method to obtain  ≪  coarse clusters; their centroids
define three candidate pivot budgets  ∈ {100, 125, 150}. Validation retains  =150 for A and
 =100 for B–D (seed = 42).
3. Fast QUBO sweep (100 SA reads). For every ⟨, method,  ⟩ build the exact- QUBO, solve it
once with 100 SA reads, and log validation Davies–Bouldin and nDCG@10.
4. Pivot-budget selection. Choose the  that minimizes Davies–Bouldin (nDCG@10 breaks ties);
this reproduces the preferences in step 2.
5. Full Annealing run (2 000 reads). Rebuild the exact- QUBO with the selected  and solve it
with 2 000 reads on the designated annealer (SA or QA) to obtain the  medoids.
6. Global assignment and metrics. Attach each of the  embeddings to its nearest medoid, then
compute final Davies–Bouldin and mean nDCG@10.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>This section presents the experimental setup and results for evaluating our proposed approaches across
two main tasks: Task 2 (Instance Selection) and Task 3 (Clustering). We detail the evaluation scenarios,
including datasets, submission configurations, and whether Simulated Annealing (SA) or Quantum
Annealing (QA) was used. The results are reported as computed by the challenge organizers. Finally,
we provide a discussion analyzing the trade-ofs between performance and eficiency achieved by our
methods compared to baselines and competing submissions.</p>
      <sec id="sec-4-1">
        <title>4.1. Evaluation Scenarios</title>
        <p>The evaluation was divided into two main scenarios corresponding to the oficial tasks defined in the
challenge. The selected tasks were: (i) Task 2: Instance Selection and (ii) Task 3: Clustering. Each
task involved the use of SA and QA approaches.</p>
        <sec id="sec-4-1-1">
          <title>4.1.1. Task 2: Instance Selection</title>
          <p>Task 2 focuses on selecting the most representative subsets of training data for fine-tuning a sentiment
classification model (Llama3.1), aiming to reduce computational cost while preserving efectiveness.
The evaluation considered both the macro 1 score and the reduction rate. We submitted several
configurations using both SA and QA methods on the Vader NYT and Yelp datasets. Table 2 summarizes
our submitted runs for Task 2, indicating which approaches were applied to each dataset using SA and
QA.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>4.1.2. Task 3: Clustering</title>
          <p>Task 3 involves clustering sentence embeddings to support eficient document retrieval and exploration.
Quality was evaluated both intrinsically with the Davies–Bouldin Index and extrinsically with nDCG@10
on test queries. We submitted clustering results on the large and small variants of the ANTIQUE dataset
using both SA and QA. Table 3 summarizes our submitted runs for Task 3, indicating which approaches
were applied to each dataset using SA and QA.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Results</title>
        <p>Tables 4 and 5 present a summary of the results for Task 2, using the Vader and the Yelp datasets
respectively. They show the performance of our approaches—SentimentPairs (SP) and LocalSet for
both datasets, and Pre-Selection and Statistical Tuning submitted in the variants emoconflictCard
and SentimentKmeansCard only for the Yelp dataset—compared to the baseline and the top-performing
submission in terms of average macro 1 score. Results are averaged over five cross-validation folds,
where available. The Evaluation Time column combines fine-tuning and prediction time (in seconds),
while the Annealing Time refers to the total execution time for QA approaches, including programming,
sampling, and post-processing. Submissions marked with an asterisk (* ) did not provide results for all
ifve folds due to issues in the challenge’s execution infrastructure, which were beyond the control of
both participants and organizers. The SentimentPairs (SP) variant was executed with reads=2000 and
limit=True.</p>
        <p>Table 6 summarises the Task 3 outcomes on the ANTIQUE corpus obtained using only SA. For each
target panel size  ∈ {10, 25, 50} we report the nDCG@10 and the Davies-Bouldin computed on the
entire dataset. The Annealing time column aggregates programming, sampling, and post-processing of
the SA search, executed with the default neal schedule (2000 reads, defaults settings).</p>
        <p>DBI</p>
        <p>Anneal time ( s)</p>
        <p>Type</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Discussion</title>
        <p>This section discusses the main findings of our participation in the QuantumCLEF Lab, organized by
task. We analyze the results obtained in the instance selection task (Section 4.3.1) and the clustering
task (Section 4.3.2), highlighting the efectiveness of our proposed methods, their trade-ofs, and their
relative performance compared to other participating systems and the baseline.</p>
        <sec id="sec-4-3-1">
          <title>4.3.1. Task 2: Instance Selection</title>
          <p>As shown in Table 4, our approaches—SentimentPairs (SP) and LocalSet—demonstrate efective
trade-ofs between data reduction and performance. Compared to the baseline, which uses the full
dataset and achieves the highest macro 1, our best-performing approach enables a 50% reduction in
data, translating into nearly half the fine-tuning and prediction time, at the cost of a moderate decrease
in macro 1. When compared to the top competing system in terms of macro 1, our approach achieves
only 2.6% lower macro 1 on average while using half as much data, efectively doubling the reduction
rate (25% → 50%).</p>
          <p>Among our methods, the LocalSet approach yields the highest macro 1 (63.3) with an average data
reduction of 50%, striking a strong balance between eficiency and accuracy. Our SentimentPairs variants
that include pair-related documents trade only 1.1% of macro 1 for up to 70% data reduction. In contrast,
the strictest SentimentPairs configurations, which retain only the most relevant documents, reach
reduction rates of 83–95%, though at a more pronounced cost in performance (macro 1 drops from 62%
to 47%). These results highlight the flexibility of our methods in tailoring the reduction-performance
trade-of for diferent application needs.</p>
          <p>The results in Table 5 also reveal a good balance between efectivity and reduction for our methods.
Our best-performing method on this dataset achieves a 50% reduction of the dataset without afecting
the macro 1 score with respect to the baseline. As compared to the top-performing submission in
terms of macro 1, our approach supposes only a 0.1% decrease of this metric, and doubles the dataset
reduction.</p>
          <p>Our methods present varied and versatile solutions to the instance selection bi-objective problem using
the Yelp dataset. Our LocalSet approach achieves the highest macro 1 score (99.4) while halving the Yelp
dataset size. Our SentimentPairs variant including pair-related documents achieves a higher reduction
(62.7%), resulting in a decrease of the 59.4% of the fine-tuning and prediction time by compromising
only a 0.2% of the 1 score with respect to the baseline. The maximum evaluation time reduction (91.6%)
is achieved by our SentimentPairs variant including just-final documents, at the expense of decreasing
the 1 metric only an 8.6%. Finally, our two variants of the Pre-Selection and Statistical Tuning method
are at a middle ground between our methods achieving high performance (LocalSet and SP pair-related)
and high reduction (SP just-final ). These variants have been executed using both SA and QA. The
emoconflictCard achieves slightly better results, with an average macro 1 score only 0.7% behind the
baseline and an average reduction of 71.5% of the dataset, resulting in an average reduction of 67.8% of
the fine-tuning and prediction time.</p>
          <p>Overall, our results in Task 2 show that it is possible to substantially reduce the size of
sentimentlabeled datasets while maintaining competitive classification performance. Among our methods, the
LocalSet approach stands out for achieving the highest efectiveness across both datasets, consistently
delivering top macro 1 scores while halving the training data. This makes it particularly attractive for
scenarios requiring balanced trade-ofs between performance and computational eficiency. The diversity
of our approaches—ranging from aggressive reduction strategies to more conservative selections—also
highlights the adaptability of quantum-inspired instance selection techniques to various application
needs and constraints.</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>4.3.2. Task 3: Clustering</title>
          <p>As shown in Table 6, the SA runs submitted by our team outperform the oficial baseline for every
tested cluster size  and in every evaluation metric. Compared to other submissions, ours match or
outperform them in all scenarios except when  = 10, where the diference is negligible. For  = 10,
our 3-FPS-Medoids configuration achieves an nDCG@10 of 0.578, a relative gain of 5 % over the baseline
(0.551) and virtually the same efectiveness as the best performing submission regarding nDCG@10
(0.580). Additionally, all our variants reduce the Davies–Bouldin from 7.99 to at most 6.43; this
improvement comes at a computational cost of approximately 15 s (15 M  s) of annealing time versus
the 0.008 second runtime of other participant submissions. By increasing the cluster size to  = 25,
our 3-FPS-Medoids approach remains in the lead with nDCG@10 = 0.547 (+4 % over baseline), and our
remaining methods collectively reduce the DBI from 6.12 to 5.56, although their runtime increases to
20–40 s (20–40 M  s). The diference increases for  = 50: our 3-FPS-Medoids method reaches 0.559
nDCG@10, corresponding to a gain of 20 % over the baseline (0.466), and achieves the lowest DBI (4.45).
In contrast, the submission with the best DBI has a runtime of less than 0.3 s, but its nDCG@10 drops
dramatically to 0.006.</p>
          <p>Overall, each one of our variant ofers a superior efectiveness—cohesion profile relative to the
baseline, and the SA framework scales robustly with list length, unlike the faster but markedly less
stable that the other submissions approaches. Among our methods, 3-FPS-Medoids provides the best
trade-of between retrieval quality and cluster compactness; MBK-Medoids consistently yields the
tightest clusters; CLARA–CLARANS occupies a middle ground, delivering efectiveness and cohesion
above the baseline with the lowest runtime of our SA runs, making it attractive when computational
budget is limited; and SubMedoids-QUBO maintains solid efectiveness at the expense of the highest
computational cost.</p>
          <p>It is important to note that, while the comparison with other submissions provides useful insights,
it may not be entirely conclusive due to a structural diference in how centroids were represented.
Three of our four submissions—SubMedoids-QUBO, CLARA–CLARANS, and MBK-Medoids—generate
centroids with lower dimensionality than the original embedding. This dimensionality reduction,
although beneficial for improving cluster compactness and interpretability, may have introduced a
disadvantage under the evaluation protocol, particularly afecting metrics like nDCG@10 that are
sensitive to representation format. As the evaluation setup did not anticipate lower-dimensional
centroids, the relative efectiveness of these methods should be interpreted with care.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>This work presented our approaches and findings for the instance selection and clustering tasks defined
in the Quantum CLEF Lab. Our methods achieved competitive results in both scenarios, obtaining
top-ranked performance in multiple evaluations. In Task 2, our LocalSets-based strategy achieved
the highest efectiveness while maintaining a substantial reduction in training data. In Task 3, our
FPS-Medoids approach yielded the best clustering results in terms of nDCG@10. Overall, our results
demonstrate efective trade-ofs between performance and reduction, showing that it is possible to
significantly reduce the amount of training data or clustering complexity without incurring major
performance losses.</p>
      <p>Key contributions of this work include the following.</p>
      <p>• We present three multi-paradigm approaches to the instance selection problem, each leveraging
complementary principles: (i) sentiment-aware document pairing, (ii) local similarity-based
criteria, and (iii) classical heuristic methods for data reduction.
• We propose a unified clustering framework specifically designed for quantum annealing, which
integrates four distinct pivot selection strategies to optimize centroid-based grouping in
highdimensional embedding spaces.</p>
      <p>Future Work. Future work will focus on addressing current limitations and exploring extensions
of our proposed methods. Due to infrastructure constraints during the challenge, quantum annealing
could not be fully leveraged across all configurations; future experiments will aim to complete the
evaluation of QA-based approaches. In Task 2, our SentimentPairs and LocalSets methods relied solely
on precomputed embeddings, omitting potentially informative textual features such as readability or
syntactic structure. Preliminary results from other configurations suggest that integrating custom
embeddings or textual signals may enhance performance. Additionally, solving Task 2 involved tackling
several clustering-related subproblems. Some of the techniques developed, such as LocalSets, could be
adapted for Task 3. However, this would require automatically labeling the unlabeled documents in
Task 3 to enable the application of supervised or semi-supervised strategies.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>Funding: The research work conducted is part of the R&amp;D projects “CORTEX: Conscious Text
Generation” (PID2021-123956OB-I00), funded by MCIN/ AEI/10.13039/501100011033/ and by “ERDF A
way of making Europe”; “CLEAR.TEXT:Enhancing the modernization public sector organizations by
deploying Natural Language Processing to make their digital content CLEARER to those with
cognitive disabilities” (TED2021-130707B-I00), funded by MCIN/AEI/10.13039/501100011033 and “European
Union NextGenerationEU/PRTR”; “VIVES: “Pla de Tecnologies de la Llengua per al valencià” project
(2022/TL22/00215334) as subproject of “ILENIA” project (grant number 2022/TL22/00215337) from the
Ministerio de Transformación Digital y por el Plan de Recuperación, Transformación y Resiliencia –
Financiado por la Unión Europea – NextGenerationEU; and the project “NL4DISMIS: Natural Language
Technologies for dealing with dis- and misinformation with grant reference (CIPROM/2021/021)" funded
by the Generalitat Valenciana and “GEO.IA PLATAFORMA DE GEOINTELIGENCIA ARTIFICIAL PARA
RESOLVER PROBLEMAS A LOS CIUDADANOS Y FACILITAR LA TOMA DE DECISIONES
ESTRATÉGICAS EN LAS ADMINISTRACIONES PÚBLICAS” (INNEST/2023/344) funded by Agencia Valenciana
de la Innovación under the “Programa Comunitat Valenciana Fondo Europeo de Desarrollo Regional
(FEDER) 2021-2027”. Additionally, this work is funded by the Ministerio para la Transformación Digital
y de la Función Pública and Plan de Recuperación, Transformación y Resiliencia - Funded by EU –
NextGenerationEU within the framework of the project Desarrollo Modelos ALIA.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used ChatGPT, Grammarly in order to: Grammar
and spelling check. After using these tools/services, the author(s) reviewed and edited the content as
needed and take(s) full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pasin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Dacrema</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Cuhna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Gonçalves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cremonesi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <year>Quantumclef 2025</year>
          :
          <article-title>Overview of the second quantum computing challenge for information retrieval and recommender systems at CLEF</article-title>
          , in: G. Faggioli,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rosso</surname>
          </string-name>
          , D. Spina (Eds.),
          <source>Working Notes of CLEF 2025 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings</source>
          ,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pasin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Dacrema</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Cuhna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Gonçalves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cremonesi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          , Overview of quantumclef
          <year>2025</year>
          :
          <article-title>The second quantum computing challenge for information retrieval and recommender systems at CLEF</article-title>
          , in: J.
          <string-name>
            <surname>Carrillo-de-Albornoz</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Gonzalo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Plaza</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. G. S. de Herrera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Mothe</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piroi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Rosso</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Spina</surname>
          </string-name>
          , G. Faggioli, N. Ferro (Eds.),
          <source>Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the Sixteenth International Conference of the CLEF Association (CLEF</source>
          <year>2025</year>
          ), Lecture Notes in Computer Science,
          <year>2025</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Kaufman</surname>
          </string-name>
          , P. J.
          <string-name>
            <surname>Rousseeuw</surname>
          </string-name>
          ,
          <article-title>Finding groups in data: an introduction to cluster analysis</article-title>
          , John Wiley &amp; Sons,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Murtagh</surname>
          </string-name>
          ,
          <article-title>A survey of recent advances in hierarchical clustering algorithms</article-title>
          ,
          <source>The computer journal 26</source>
          (
          <year>1983</year>
          )
          <fpage>354</fpage>
          -
          <lpage>359</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>S. RR</surname>
          </string-name>
          ,
          <article-title>A statiscal method for evaluating systematic relationships</article-title>
          ,
          <source>Univ Kans sci bull 38</source>
          (
          <year>1958</year>
          )
          <fpage>1409</fpage>
          -
          <lpage>1438</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Glover</surname>
          </string-name>
          , G. Kochenberger,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <article-title>A tutorial on formulating and using qubo models</article-title>
          , arXiv preprint arXiv:
          <year>1811</year>
          .
          <volume>11538</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Kochenberger</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-K. Hao</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Glover</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          <string-name>
            <surname>Lü</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>The unconstrained binary quadratic programming problem: a survey</article-title>
          ,
          <source>Journal of combinatorial optimization 28</source>
          (
          <year>2014</year>
          )
          <fpage>58</fpage>
          -
          <lpage>81</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Brighton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mellish</surname>
          </string-name>
          ,
          <article-title>Advances in instance selection for instance-based learning algorithms</article-title>
          , Data
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>