<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Embedding-Aware Quantum-Classical SVMs for Scalable Quantum Machine Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sebastián Andrés Cajas Ordóñez</string-name>
          <email>sebastian.cajasordonez@ucd.ie</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luis Fernando Torres Torres</string-name>
          <email>lf.torres@udea.edu.co</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mario Bifulco</string-name>
          <email>mario.bifulco@unito.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carlos Andres Duran</string-name>
          <email>carlos.duran@unicauca.edu.co</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cristian Bosch</string-name>
          <email>cristian.boschserrano@ucd.ie</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ricardo Simon Carbajo</string-name>
          <email>ricardo.simoncarbajo@ucd.ie</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Quantum Support Vector Machines (QSVMs), Hybrid Quantum-Classical Models, Pretrained Embeddings, Tensor</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Corporation for Aerospace Initiatives (CASIRI), University of Cauca</institution>
          ,
          <addr-line>Popayán</addr-line>
          ,
          <country country="CO">Colombia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, University of Torino</institution>
          ,
          <addr-line>Torino</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>National Irish Centre for AI (CeADAR), University College Dublin (UCD)</institution>
          ,
          <addr-line>Dublin</addr-line>
          ,
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>SISTEMIC Research Group, University of Antioquia</institution>
          ,
          <addr-line>Medellín</addr-line>
          ,
          <country country="CO">Colombia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Quantum Support Vector Machines face scalability challenges due to high-dimensional quantum states and hardware limitations. We propose an embedding-aware quantum-classical pipeline combining class-balanced k-means distillation with pretrained Vision Transformer embeddings. Our key finding: ViT embeddings uniquely enable quantum advantage, achieving up to 8.02% accuracy improvements over classical SVMs on FashionMNIST and 4.42% on MNIST, while CNN features show performance degradation. Using 16-qubit tensor network simulation via cuTensorNet, we provide the first systematic evidence that quantum kernel advantage depends critically on embedding choice, revealing fundamental synergy between transformer attention and quantum feature spaces. This provides a practical pathway for scalable quantum machine learning that leverages modern neural architectures.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Quantum computing has emerged as a transformative paradigm with the potential to outperform
classical approaches on specialized computational tasks [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Concurrently, machine learning (ML)
continues advancing rapidly, driven by increasing data availability and accelerated computing hardware
[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Quantum machine learning (QML), at the intersection of these fields, has significant potential to
unlock new capabilities in data processing and complex problem solving [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. By utilizing quantum
phenomena such as superposition and entanglement, QML algorithms may efectively address high
dimensionality and combinatorial complexity beyond classical counterparts [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ].
      </p>
      <p>
        Despite these prospects, large-scale quantum computing faces substantial practical challenges from
noise and decoherence. Advanced error correction protocols can address these issues through
sophisticated decoding algorithms, though their design complexity has led researchers to increasingly employ
machine learning approaches for automation and enhancement of these decoding tasks [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Beyond
error correction, QML applications have been explored extensively across domains such as image
classification, natural language processing, and high energy physics [
        <xref ref-type="bibr" rid="ref10 ref6 ref7 ref8 ref9">6, 7, 8, 9, 10</xref>
        ]. Although proof of
concept demonstrations on quantum processors like Google’s Sycamore [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and IBM’s superconducting
systems [
        <xref ref-type="bibr" rid="ref12">12, 13</xref>
        ] show feasibility, significant gaps persist between laboratory experiments and reliable
industrial deployment [14, 15].
      </p>
      <p>CEUR
Workshop</p>
      <p>ISSN1613-0073</p>
      <p>
        A promising strategy for overcoming these gaps involves leveraging embedding and dimensionality
reduction methods. Classical preprocessing techniques such as principal component analysis, or
learned neural encoders like variational autoencoders, efectively reduce dataset complexity prior
to quantum model input [
        <xref ref-type="bibr" rid="ref8">8, 16</xref>
        ]. These hybrid approaches help manage limited qubit resources on
current quantum hardware while exploiting quantum enhanced feature spaces [
        <xref ref-type="bibr" rid="ref5">17, 5</xref>
        ]. Benchmark
studies comparing quantum and classical ML approaches underline the potential and present limitations
of QML [18, 14, 15]. Advancing this field requires systematic evaluations under realistic conditions,
incorporating representative datasets, accurate noise models, and relevant performance metrics [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3, 19</xref>
        ].
      </p>
      <p>We propose an embedding-aware, hybrid quantum-classical QSVM framework designed to address
the scalability limitations of quantum machine learning. By integrating class-balanced  -means data
distillation with pretrained embeddings, our pipeline reduces data dimensionality while preserving
task-relevant structure. Quantum kernel classification is performed using tensor network simulation
with NVIDIA’s cuTensorNet [20, 21]. Benchmarking on MNIST and Fashion-MNIST shows that this
embedding-driven approach consistently outperforms classical and quantum baselines in both accuracy
and eficiency, confirming its value for scalable and resource-constrained quantum machine learning
applications.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Scaling QML with Simulation Frameworks</title>
        <p>Simulating larger circuits remains a popular strategy because near-term quantum devices have limited
qubits. Eficient tensor-network methods can push simulations of quantum support vector machines
(QSVMs) to hundreds of qubits [21], addressing scaling issues that plague naïve state-vector simulators.
This line of research proves instrumental for prototyping advanced QML algorithms, guiding their
eventual deployment on real hardware [16, 22].</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Quantum Classifiers for Image Recognition</title>
        <p>
          Variational quantum classifiers (VQCs), variational quantum circuits [
          <xref ref-type="bibr" rid="ref6">6, 23, 22</xref>
          ], quantum kernels,
and hybrid architectures [
          <xref ref-type="bibr" rid="ref10 ref3 ref7">7, 10, 3, 24, 25, 26</xref>
          ] have been applied to standard benchmarks such as
MNIST, Fashion-MNIST, and medical imaging tasks [
          <xref ref-type="bibr" rid="ref6 ref7 ref9">6, 7, 9, 27</xref>
          ]. Although classical deep networks
often outperform small quantum models on large-scale datasets, quantum classifiers show competitive
performance in data-scarce or high-dimensional settings by leveraging specialized embeddings and
kernel methods [
          <xref ref-type="bibr" rid="ref8">8, 26</xref>
          ]. Several studies have also demonstrated end-to-end quantum classification
pipelines executed on actual hardware, though typically limited to smaller datasets due to current
device constraints [
          <xref ref-type="bibr" rid="ref10 ref6">6, 10</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Hybrid Classical-Quantum Techniques</title>
        <p>
          Several researchers propose hybrid approaches: classical layers for data preprocessing or encoding,
followed by quantum layers for feature transformation or classification [
          <xref ref-type="bibr" rid="ref7 ref8">7, 8, 16</xref>
          ]. These methods can
ofset limited qubit counts by handing only compressed or task-relevant information to the quantum
circuit [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Genetic algorithms [16], autoencoders [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], and transfer learning [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] have all been employed
to optimize these hybrid models.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Applications Beyond Image Classification</title>
        <p>
          QML has also been trialed in domains like high-energy physics [
          <xref ref-type="bibr" rid="ref8">8, 13</xref>
          ], medical diagnosis [
          <xref ref-type="bibr" rid="ref7">7, 27</xref>
          ], and
scientific computing [ 21]. In certain specialized tasks - e.g., identifying the Higgs boson in proton
collision data [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] - quantum models can match or exceed classical baselines under realistic noise models.
These studies highlight the versatility of QML but also the pressing need for systematic benchmarking
to compare cost-benefit trade-ofs [ 18, 15].
        </p>
        <p>
          As a whole, prior works illustrate QML’s broad applicability, from error decoding [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and fundamental
reviews [
          <xref ref-type="bibr" rid="ref2">2, 14</xref>
          ] to specialized classifiers for real-world tasks [
          <xref ref-type="bibr" rid="ref10 ref6 ref7">6, 7, 10</xref>
          ]. However, two critical gaps
remain in the literature. First, a cohesive comparison that unifies insights across multiple domains
and positions these results against robust classical baselines remains a key frontier [14, 15]. Second,
existing approaches lack systematic investigation of how diferent embedding strategies afect quantum
advantage, particularly the synergy between modern neural representations and quantum feature
spaces.
        </p>
        <p>The present study addresses both gaps by systematically evaluating quantum-enhanced classification
models alongside classical baselines using diverse embedding strategies, revealing fundamental
relationships between representation choice and quantum kernel performance on representative datasets.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Strategy Overview</title>
        <p>Our approach addresses the scalability challenges of quantum machine learning through a hybrid
quantum-classical pipeline that strategically combines data preprocessing, feature extraction, and
quantum kernel methods. As illustrated in Figure 1, the framework operates through eight sequential
stages: we begin with image data extraction and preprocessing, followed by data distillation using
class-balanced  -means clustering to reduce dataset size while maintaining representative samples. Next,
we generate vector representations using pretrained models such as EficientNet-B3 [ 28] and Vision
Transformer variants [29], then apply Principal Component Analysis (PCA) to compress embeddings
and match quantum hardware constraints. The processed embeddings are used to design a Quantum
Support Vector Machine (QSVM) using the Tensor Network Support Matrix (TNSM) framework [21],
which constructs quantum kernels through parameterized circuits and tensor network simulation using
a data re-uploading and compute-uncompute strategy. Finally, we evaluate model performance through
cross-validation and test on a held-out validation set to assess generalization capability. This
embeddingaware strategy enables us to leverage the representational power of modern neural architectures while
exploiting quantum kernel advantages for classification tasks, making quantum machine learning more
practical and scalable for real-world applications.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Quantum Model Architecture</title>
        <p>Chen et al. [21] forms the foundational simulation framework upon which our quantum model is
built. The architecture employs a parameterized quantum circuit that encodes input data through
rotational gates and entanglement layers within a Block-Encoded State (BPS) circuit, as shown in
Figure 2. This data re-uploading circuit design has been validated in quantum learning applications
and implemented within the Qiskit framework [30], serving as a proven foundation for kernel-based
quantum classification.</p>
        <p>It constructs a quantum kernel using a compute–uncompute strategy applied to the BPS circuit,
parameterized on the input examples. This method enables the model to capture complex relationships
between data points efectively. The resulting circuits are mapped onto tensor networks using the
CircuitToEinsum converter, enabling eficient simulation on classical hardware. The kernel matrix is
computed by contracting the tensor networks for all training pairs using an autotuned contraction path.</p>
        <p>Implementation Optimizations. In contrast to the original implementation [21], we introduce
several performance enhancements to the operand construction pipeline. First, to reduce redundant
computations of trigonometric and exponential operations across repeated input angles, we apply
function-level caching using Python’s @cache decorator. This memorization strategy significantly
reduces overhead during the generation of gate matrices (e.g., parameterized  and  gates), which are
frequently reused across multiple operand evaluations. Additionally, we precompute sine and cosine
values in a shared utility function to avoid duplicating expressions and improve code modularity. Operand
batches are generated via list comprehensions instead of iterative append calls, enhancing memory
eficiency and readability. Likewise, tensor network amplitudes are computed using preallocated lists,
eliminating dynamic resizing and reducing garbage collection pressure.</p>
        <p>These optimizations form our enhanced baseline, designated as Baseline+, while the original
implementation is referred to as Baseline. All comparisons throughout this paper use Baseline+ as the
reference. The improvements collectively reduce execution time and peak memory usage during
simulation, as demonstrated in Table 3, and establish a foundation for future CuPy-based parallelization in
operand template generation.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Quantum Kernel Formulation</title>
        <p>
          The quantum kernel between two data points   and   is computed as the transition amplitude between
their corresponding quantum states [
          <xref ref-type="bibr" rid="ref3">3, 31</xref>
          ]:
  (  ,   ) = |⟨(  )|(  )⟩|2
(1)
where |()⟩ =  ()|0⟩ ⊗ represents the quantum feature map implemented by our parameterized
circuit  () . The quantum advantage emerges from the exponentially large Hilbert space dimension 2
compared to the classical feature space [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. However, this advantage is critically dependent on how
classical data  is embedded before quantum encoding.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Setup</title>
      <sec id="sec-4-1">
        <title>4.1. Dataset and Data Distillation Process</title>
        <p>Due to the computational complexity of scaling Quantum Support Vector Machines (QSVMs) to
highdimensional data, particularly in the context of image classification, we adopt a distilled version of the
dataset to reduce resource requirements while preserving performance.</p>
        <p>MNIST Dataset: We use the MNIST dataset [32], a widely recognized benchmark for evaluating
image classification models. It consists of 70,000 grayscale images of handwritten digits (0–9), each of
size 28 × 28 pixels, divided into 60,000 training and 10,000 test samples.</p>
        <p>Fashion-MNIST Dataset: We also utilize the Fashion-MNIST dataset [33], a benchmark dataset
designed as a more challenging alternative to MNIST. It comprises 70,000 grayscale images of 10 fashion
item categories (e.g., t-shirts, trousers, dresses), each of size 28 × 28 pixels, split into 60,000 training and
10,000 test samples. This dataset provides a diverse set of visual patterns, enabling robust evaluation of
our QSVM pipeline in a multi-class classification setting.</p>
        <p>Data Distillation: To address QSVM scalability constraints, we employ a class-balanced dataset
distillation approach based on  -means clustering. The algorithm iterates through each class, applies
 -means with  = 200 to identify representative centroids, and selects the real data point closest to
each centroid as a prototype, yielding exactly 200 samples per class. The resulting distilled dataset
contains 2,000 samples total (1,600 for training, 400 for testing), reducing computational complexity
from  (70000 2) to  (1600 2) kernel evaluations while preserving representative coverage of each class’s
feature distribution and eliminating class imbalance efects. The distillation parameters (  value and
dataset size) can be customized in our implementation based on available computational resources and
hardware constraints, enabling adaptation to diferent quantum simulation capabilities.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Embedding Extraction and Data Compression</title>
        <p>To construct compact and informative inputs for quantum classification, we extract high-dimensional
feature embeddings using pretrained image encoders. Specifically, we employ EficientNet-B3 [ 28] and
Vision Transformer (ViT) variants [29] trained via the CLIP framework [34]. These models, pretrained
on large-scale datasets, capture rich semantic features that are well-suited for downstream classification
tasks.</p>
        <p>EficientNet-B3 produces 1536-dimensional embeddings, while ViT models typically output 768 or
512-dimensional vectors. To evaluate trade-ofs between representation richness and simulation cost,
we experiment with three dimensionality settings: 512, 768, and 1536, across diferent architectures.
For lower-dimensional settings, we apply Principal Component Analysis (PCA) to reduce embedding
size while preserving key variance, thereby aligning inputs with the capacity limits of our 16-qubit
quantum kernel simulator.</p>
        <p>To benchmark performance across embedding strategies, we map each encoder-dimension pair to a
shorthand label used throughout our analysis (e.g., ViT-B/16-512, EffNet-1536). Table 1 summarizes
these configurations, including the native embedding size and dimensionality used in simulation.</p>
        <p>As a point of reference, we include two baselines: the Baseline, based on Chen et al.’s original
QSVM using flattened image pixels, and our enhanced version, Baseline+, which incorporates the
computational enhancements introduced in Section 3.2. All remaining models use the same enhanced
QSVM backend as Baseline+, difering only in their embedding source and size.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Evaluation Methodology</title>
        <p>Model performance is assessed through 5-fold cross-validation to ensure robust statistical evaluation.
We measure classification accuracy, precision, F1-score, and Area Under the Curve (AUC) to provide
comprehensive performance characterization. Computational eficiency is evaluated by tracking total
execution time and peak memory usage during training and evaluation phases.</p>
        <p>Classical SVMs are implemented using scikit-learn’s SVC with RBF kernel and hyperparameters
(C=1.0,  =’scale’, probability=True). All preprocessing steps, including embedding extraction and PCA
dimensionality reduction, are identical between classical and quantum approaches. This ensures that
performance diferences reflect only the kernel computation method rather than data preparation
artifacts.</p>
        <p>Our evaluation directly contrasts quantum support vector machines against classical SVM baselines
using identical feature representations and evaluation protocols. This approach isolates the impact of
quantum kernel methods from preprocessing efects, enabling fair assessment of quantum advantage
claims. Statistical significance is evaluated through cross-validation consistency and standard deviation
analysis.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Computational Infrastructure</title>
        <p>All experiments are conducted on NVIDIA A100 Tensor Core GPUs with 80GB HBM2 memory,
using CUDA 12.0 and NVIDIA’s cuQuantum cuTensorNet backend [20] for quantum simulation. This
high-performance computing environment ensures consistent benchmarking conditions and enables
eficient tensor network contraction for quantum kernel computation. The GPU-accelerated
simulation framework allows us to explore larger quantum circuits than would be feasible with CPU-based
approaches.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Analysis</title>
      <sec id="sec-5-1">
        <title>5.1. Quantum Advantage with Modern Neural Embeddings</title>
        <p>Our central finding demonstrates that quantum support vector machines achieve consistent performance
improvements over classical SVMs when using Vision Transformer embeddings, while showing degraded
performance with raw pixels or CNN vector embeddings for this specific setting. Table 2 presents our
key results: Quantum models with ViT embeddings achieve accuracy gains up to 4.4% on MNIST and
8.0% on Fashion-MNIST, while traditional approaches (raw pixels, EfNet features) show performance
degradation.</p>
        <p>Baseline+ reduced runtime from 4,492 to 3,812 seconds during cross-validation, saving 680 seconds,
and brought peak memory usage down from 44.1GB to 43.5GB. All enhancements to the quantum
pipeline, including caching, memory preallocation, and parallel tensor contractions, were built on this
enhanced baseline rather than the original, as observed in Table 3.</p>
        <p>This quantum advantage emerges specifically with transformer-based representations, revealing a
fundamental synergy between quantum kernels and modern neural embeddings. All Vision Transformer
variants demonstrate positive quantum advantage, indicating this is not merely an artifact of specific
architectural choices but rather reflects a deeper compatibility between quantum feature spaces and
transformer-learned representations.</p>
        <p>The results highlight the critical importance of feature representation selection in quantum machine
learning. Traditional approaches using raw pixels or CNN-based features consistently favor classical
methods, while transformer embeddings unlock quantum computational advantages. This finding
has important implications for the design of quantum machine learning systems, suggesting that the
preprocessing stage is as crucial as the quantum algorithm itself for achieving quantum advantage.</p>
        <p>The practical significance of up to 8% accuracy improvement represents substantial value for
realworld applications, particularly in domains where high precision is critical such as medical diagnosis or
safety-critical systems. These gains, while seemingly modest, can translate to significant improvements
in deployment scenarios where accuracy diferences directly impact outcomes.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Cross-Validation Performance Analysis</title>
        <p>Comprehensive 5-fold cross-validation results confirm the robustness of our quantum advantage findings
across multiple performance metrics. Table 3 presents detailed performance evaluation including
accuracy, precision, F1-score, AUC, runtime, and memory usage for both classical baselines and
quantumenhanced models.</p>
        <p>The best-performing quantum model, QSVM using ViT-L/14@336-768, achieves an accuracy of 97.6%
on MNIST and 84.1% on Fashion-MNIST, clearly surpassing the baselines over pixel information, which
level of around 88.2% and 72.5%, respectively. The near-perfect AUC scores of 99.9% across all ViT-based
quantum models suggest that these approaches reliably capture discriminative patterns with minimal
classification errors.</p>
        <p>The consistent improvement across cross-validation folds is especially notable. Accuracy standard
deviations remain low, typically between ±0.003 and ±0.020, showing that the observed quantum
advantage is stable and reproducible, rather than the result of favorable data splits or initialization. This
level of consistency is vital for deployment in practical settings where reliable performance is required.</p>
        <p>In addition, the quantum models display strong precision-recall alignment, with precision scores
closely tracking overall accuracy across all configurations. This balanced performance suggests that
quantum kernels provide meaningful gains in class-level discrimination, rather than boosting accuracy
by favoring specific categories.</p>
        <p>The confusion matrices in Figures 7 and 8 further illustrate the generalization power of our top model,
QSVM with ViT-L/14@336-768, showing alignment between cross-validation and held-out test results.
Generalization is stronger for MNIST than for Fashion-MNIST. The clear diagonal structure and few
of-diagonal errors confirm that the high accuracy reflects true performance across all digit classes, not
just select ones.</p>
        <p>Violin plots in Figures 3 and 4 visualize test accuracy distributions over cross-validation folds. QSVM
models with ViT embeddings, including ViT-B/16, ViT-L/14, and ViT-L/14@336, consistently achieve
higher average accuracies and lower variance compared to both baselines and EficientNet-based
QSVMs. ViT-B/16-512 and ViT-L/14@336-768 show especially narrow, high-accuracy distributions on
MNIST, while baseline and EficientNet models display wider, more variable spreads, particularly on
Fashion-MNIST. These results highlight the advantage of transformer embeddings in delivering stable
and accurate quantum classification across tasks of varying complexity.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Computational Eficiency and Scalability Analysis</title>
        <p>Our embedding-enhanced quantum models demonstrate strong accuracy while maintaining reasonable
computational demands for quantum simulations. Most ViT-based quantum configurations complete
training and evaluation in approximately 3,800 seconds with consistent memory usage around 43GB,
as detailed in Table 3. While these runtimes may appear substantial, they represent a significant
improvement over prior quantum simulations and are reasonable given the high-dimensional embedding
spaces and tensor contraction overhead inherent to quantum kernel methods.</p>
        <p>Among the top-performing models, QSVM with ViT-B/16-512 ofers the optimal balance between
performance and eficiency, achieving 97.3% accuracy with the fastest runtime of 3,763 seconds.
Figures 5 and 6 illustrate the trade-ofs between computational cost and classification performance. Vision
Transformer-based models consistently achieve top-tier accuracy with moderate computational
requirements, while EficientNet configurations provide competitive accuracy with reduced resource demands.
The original Baseline model shows the least favorable performance-eficiency balance, while Baseline+
demonstrates consistent runtime improvements.</p>
        <p>These results confirm that embedding-enhanced quantum models ofer practical scalability alongside
significant accuracy gains. Vision Transformer embeddings clearly outperform EficientNet-B3 across
both datasets, and the computational overhead of quantum kernel methods is efectively mitigated by
the reduced dataset sizes achieved through strategic data distillation, making these approaches viable
for real-world deployment.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>This study demonstrates that quantum advantage in machine learning emerges not from quantum
algorithms alone, but from the strategic synergy between quantum kernels and appropriate feature
representations. Our central finding reveals that Vision Transformer embeddings uniquely unlock
quantum advantage, achieving up to 8.02% accuracy improvements over classical SVMs, while CNN
features and raw pixels consistently favor classical approaches.</p>
      <p>Although quantum simulations demand substantial computational resources (approximately 3,800</p>
      <p>(a) Validation Fold (Best CV Model) for (b) Held-out Test Set (Best CV Model) for</p>
      <p>MNIST dataset MNIST dataset
seconds for training), this investment proves justified in high-precision applications where accuracy
improvements directly translate to enhanced outcomes, particularly in medical diagnosis, safety-critical
systems, and fraud detection scenarios. The computational overhead becomes advantageous compared
to scaling classical approaches for similar accuracy gains, which typically require exponentially larger
datasets or increasingly complex architectures. Our strategic data distillation efectively reduces problem
(a) Validation Fold (Best CV Model) for Fash-(b) Held-out Test Set (Best CV Model) for
ion MNIST dataset Fashion MNIST dataset
complexity from  (70000 2) to  (1600 2) kernel evaluations, making quantum kernel methods tractable
while preserving essential dataset characteristics.</p>
      <p>Our framework demonstrates that quantum machine learning achieves scalability through intelligent
preprocessing, with distillation parameters easily customizable based on available computational
resources. This adaptability enables flexible deployment across diverse scenarios, from
resourceconstrained environments utilizing smaller distilled datasets to high-performance settings leveraging
full quantum simulation capabilities. The embedding-aware approach establishes a practical pathway
toward quantum advantage that becomes increasingly favorable as quantum hardware continues to
mature.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>We present an embedding-aware quantum-classical framework that systematically addresses scalability
challenges in quantum machine learning by strategically combining class-balanced data distillation
with pretrained embeddings. Building upon Chen et al.’s GPU-accelerated quantum kernel method [21],
our pipeline successfully reduces computational complexity while achieving measurable performance
improvements over classical baselines.</p>
      <p>Our work delivers the first systematic evidence that quantum kernel advantage depends critically on
embedding choice, revealing fundamental compatibility between transformer attention mechanisms
and quantum feature spaces. Through 16-qubit tensor network simulation, we demonstrate consistent
quantum advantages using ViT embeddings across MNIST (up to 4.42% improvement) and
FashionMNIST (up to 8.02% improvement), while observing performance degradation with CNN-based features.</p>
      <p>The embedding-aware QSVM framework enables practical quantum machine learning deployment
through configurable data distillation and hardware-adaptive preprocessing. Compared to raw inputs,
structured transformer embeddings consistently deliver superior accuracy and generalization, efectively
supporting real-world applications in high-dimensional classification tasks where precision remains
critical.</p>
      <p>Limitations and Future Directions. Several limitations require attention for broader impact. Our
evaluation concentrates on relatively simple visual classification benchmarks (MNIST, Fashion-MNIST);
validation on complex datasets such as CIFAR-10, medical imaging, or domain-specific applications
remains necessary to assess generalization. The theoretical foundations explaining transformer-quantum
synergy remain largely underexplored, presenting compelling opportunities for fundamental research.</p>
      <p>Future work should pursue automated embedding and kernel selection strategies [35] to eliminate
manual hyperparameter tuning, explore sophisticated dimensionality reduction techniques beyond
PCA to better preserve semantic information, and develop optimized quantum circuit designs [36]
for enhanced computational eficiency. Expanding empirical validation to medical imaging and other
high-dimensional domains will prove critical for demonstrating broader practical utility.</p>
      <p>This work establishes that achieving quantum advantage requires careful algorithm-representation
co-design rather than naive application of quantum methods. Our embedding-aware framework
provides both immediate practical value for precision-critical applications and a scalable foundation for
quantum machine learning that efectively leverages modern neural architectures. As quantum hardware
continues to mature, this approach ofers a viable pathway toward practical quantum advantage in
real-world machine learning applications.</p>
    </sec>
    <sec id="sec-8">
      <title>8. Data and Code Availability</title>
      <p>The code used in this study is publicly available at: https://github.com/sebasmos/QuantumVE.</p>
    </sec>
    <sec id="sec-9">
      <title>9. Acknowledgments</title>
      <p>This work was supported by the Google Cloud Research Credits program under the award number
GCP19980904.</p>
    </sec>
    <sec id="sec-10">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.
[13] D. Cugini, D. Gerace, P. Govoni, A. Perego, D. Valsecchi, Comparing quantum and classical
machine learning for vector boson scattering background reduction at the large hadron collider,
Quantum Machine Intelligence 5 (2023) 35.
[14] Y. Gujju, A. Matsuo, R. Raymond, Quantum machine learning on near-term quantum devices:
Current state of supervised and unsupervised techniques for real-world applications, Physical
Review Applied 21 (2024) 067001.
[15] D. Basilewitsch, J. F. Bravo, C. Tutschku, F. Struckmeier, Quantum neural networks in practice:
A comparative study with classical models from standard data sets to industrial images, arXiv
preprint arXiv:2411.19276 (2024).
[16] K. Phalak, A. Ghosh, S. Ghosh, Optimizing quantum embedding using genetic algorithm for qml
applications, arXiv preprint arXiv:2412.00286 (2024).
[17] K. Beer, D. Bondarenko, T. Farrelly, T. J. Osborne, R. Salzmann, D. Scheiermann, R. Wolf, Training
deep quantum neural networks, Nature communications 11 (2020) 808.
[18] C. Havenstein, D. Thomas, S. Chandrasekaran, Comparisons of performance between quantum
and classical machine learning, SMU Data Science Review 1 (2018) 11.
[19] R. Potempa, S. Porebski, Comparing concepts of quantum and classical neural network models for
image classification task, in: Progress in Image Processing, Pattern Recognition and
Communication Systems: Proceedings of the Conference (CORES, IP&amp;C, ACS)-June 28-30 2021 12, Springer,
2022, pp. 61–71.
[20] NVIDIA Corporation, NVIDIA cuTensorNet: High-Performance Tensor Network Library, 2024.</p>
      <p>https://docs.nvidia.com/cuda/cuquantum/latest/cutensornet/index.html.
[21] K.-C. Chen, T.-Y. Li, Y.-Y. Wang, S. See, C.-C. Wang, R. Wille, N.-Y. Chen, A.-C. Yang, C.-Y. Lin,
Validating large-scale quantum machine learning: Eficient simulation of quantum support vector
machines using tensor networks, Machine Learning: Science and Technology (2024).
[22] K. Mitarai, M. Negoro, M. Kitagawa, K. Fujii, Quantum circuit learning, Physical Review A 98
(2018) 032309.
[23] E. Farhi, H. Neven, Classification with quantum neural networks on near term processors, arXiv
preprint arXiv:1802.06002 (2018).
[24] D. Sharma, P. Singh, A. Kumar, The role of entanglement for enhancing the eficiency of quantum
kernels towards classification, Physica A: Statistical Mechanics and its Applications 625 (2023)
128938.
[25] B.-S. Chen, J.-L. Chern, Generating quantum feature maps for svm classifier, arXiv preprint
arXiv:2207.11449 (2022).
[26] C. Blank, D. K. Park, J.-K. K. Rhee, F. Petruccione, Quantum classifier with tailored quantum
kernel, npj Quantum Information 6 (2020) 41.
[27] W. E. Maouaki, T. Said, M. Bennai, Quantum support vector machine for prostate cancer detection:</p>
      <p>A performance analysis, arXiv preprint arXiv:2403.07856 (2024).
[28] M. Tan, Q. Le, Eficientnet: Rethinking model scaling for convolutional neural networks, in:</p>
      <p>International conference on machine learning, PMLR, 2019, pp. 6105–6114.
[29] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani,
M. Minderer, G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image
recognition at scale, arXiv preprint arXiv:2010.11929 (2020).
[30] A. Javadi-Abhari, M. Treinish, K. Krsulich, C. J. Wood, J. Lishman, J. Gacon, S. Martiel, P. D. Nation,
L. S. Bishop, A. W. Cross, et al., Quantum computing with qiskit, arXiv preprint arXiv:2405.08810
(2024).
[31] M. Schuld, N. Killoran, Quantum machine learning in feature hilbert spaces, Physical review
letters 122 (2019) 040504.
[32] Y. LeCun, L. Bottou, Y. Bengio, P. Hafner, Gradient-based learning applied to document recognition,</p>
      <p>Proceedings of the IEEE 86 (1998) 2278–2324.
[33] H. Xiao, K. Rasul, R. Vollgraf, Fashion-mnist: a novel image dataset for benchmarking machine
learning algorithms, arXiv preprint arXiv:1708.07747 (2017).
[34] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin,
J. Clark, et al., Learning transferable visual models from natural language supervision, in:
International conference on machine learning, PmLR, 2021, pp. 8748–8763.
[35] M. Incudini, D. L. Bosco, F. Martini, M. Grossi, G. Serra, A. D. Pierro, Automatic and efective
discovery of quantum kernels, IEEE Transactions on Emerging Topics in Computational
Intelligence (2024) 1–10. URL: http://dx.doi.org/10.1109/TETCI.2024.3499993. doi:10.1109/tetci.2024.
3499993.
[36] L. Sünkel, D. Martyniuk, D. Mattern, J. Jung, A. Paschke, Ga4qco: Genetic algorithm for quantum
circuit optimization, 2023. URL: https://arxiv.org/abs/2302.01303. arXiv:2302.01303.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bausch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. W.</given-names>
            <surname>Senior</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. J.</given-names>
            <surname>Heras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Edlich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Davies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Newman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Satzinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Y.</given-names>
            <surname>Niu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Blackwell</surname>
          </string-name>
          , et al.,
          <article-title>Learning high-accuracy error decoding for quantum processors</article-title>
          ,
          <source>Nature</source>
          (
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Peral-García</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Cruz-Benito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. J.</given-names>
            <surname>García-Peñalvo</surname>
          </string-name>
          ,
          <article-title>Systematic literature review: Quantum machine learning and its applications</article-title>
          ,
          <source>Computer Science Review</source>
          <volume>51</volume>
          (
          <year>2024</year>
          )
          <fpage>100619</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Havlíček</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Córcoles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Temme</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. W.</given-names>
            <surname>Harrow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kandala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Chow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Gambetta</surname>
          </string-name>
          ,
          <article-title>Supervised learning with quantum-enhanced feature spaces</article-title>
          ,
          <source>Nature</source>
          <volume>567</volume>
          (
          <year>2019</year>
          )
          <fpage>209</fpage>
          -
          <lpage>212</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Abbas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sutter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zoufal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lucchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Figalli</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. Woerner,</surname>
          </string-name>
          <article-title>The power of quantum neural networks</article-title>
          ,
          <source>Nature Computational Science</source>
          <volume>1</volume>
          (
          <year>2021</year>
          )
          <fpage>403</fpage>
          -
          <lpage>409</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Senokosov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sedykh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sagingalieva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kyriacou</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Melnikov,</surname>
          </string-name>
          <article-title>Quantum machine learning for image classification</article-title>
          ,
          <source>Machine Learning: Science and Technology</source>
          <volume>5</volume>
          (
          <year>2024</year>
          )
          <fpage>015040</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>K.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Jobst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Shishenina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pollmann</surname>
          </string-name>
          ,
          <article-title>Classification of the fashion-mnist dataset on a quantum computer</article-title>
          ,
          <source>arXiv preprint arXiv:2403.02405</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>A. K. K. Don</surname>
            , I. Khalil,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Atiquzzaman</surname>
          </string-name>
          ,
          <article-title>A fusion of supervised contrastive learning and variational quantum classifiers</article-title>
          ,
          <source>IEEE Transactions on Consumer Electronics</source>
          <volume>70</volume>
          (
          <year>2024</year>
          )
          <fpage>770</fpage>
          -
          <lpage>779</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>V.</given-names>
            <surname>Belis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Odagiu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Grossi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Reiter</surname>
          </string-name>
          , G. Dissertori,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vallecorsa</surname>
          </string-name>
          ,
          <article-title>Guided quantum compression for high dimensional data classification</article-title>
          ,
          <source>Machine Learning: Science and Technology</source>
          <volume>5</volume>
          (
          <year>2024</year>
          )
          <fpage>035010</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>X.</given-names>
            <surname>Vasques</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Paik</surname>
          </string-name>
          , L. Cif,
          <article-title>Application of quantum machine learning using quantum kernel algorithms on multiclass neuron m-type classification</article-title>
          ,
          <source>Scientific Reports</source>
          <volume>13</volume>
          (
          <year>2023</year>
          )
          <fpage>11541</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Huh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. K.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <article-title>Classical-to-quantum convolutional neural network transfer learning</article-title>
          ,
          <source>Neurocomputing</source>
          <volume>555</volume>
          (
          <year>2023</year>
          )
          <fpage>126643</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>H.-Y. Huang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Broughton</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Cotler</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Mohseni</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Neven</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Babbush</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Kueng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Preskill</surname>
          </string-name>
          , et al.,
          <article-title>Quantum advantage in learning from experiments</article-title>
          ,
          <source>Science</source>
          <volume>376</volume>
          (
          <year>2022</year>
          )
          <fpage>1182</fpage>
          -
          <lpage>1186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>G.</given-names>
            <surname>Gentinetta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Thomsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Sutter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Woerner</surname>
          </string-name>
          ,
          <article-title>The complexity of quantum support vector machines</article-title>
          ,
          <source>Quantum</source>
          <volume>8</volume>
          (
          <year>2024</year>
          )
          <fpage>1225</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>