<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Parameter Prediction for Variational Quantum Algorithms through Sequence Modeling⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Corrado Loglisci</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vito Losavio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Berardina De Carolis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marjana Skenduli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Donato Malerba</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Universita' degli Studi di Bari 'Aldo Moro'</institution>
          ,
          <addr-line>Bari</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>NISQ devices represent a major step in quantum computing, though they face limitations such as short coherence times and error rates. Variational Quantum Algorithms (VQAs) help overcome these limitations by optimizing parameterized quantum circuits, but challenges like the barren plateau problem and the time spent in iterative procedures remain. Predictive machine learning can improve VQAs by providing eficient parameter initialization and estimation. However, traditional ML models for parameter prediction assume static data and batch processing, which is problematic in dynamic environments. Sequential ML models address this by adapting the predictor to timevarying data and capturing correlations over sequences of data, improving the reliability of predictions of the parameters over time and reducing the time consumption that a conventional VQA would require.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Variational quantum algorithms</kwd>
        <kwd>Parameterized quantum circuits</kwd>
        <kwd>Parameter estimation</kwd>
        <kwd>Sequence modeling</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>, can significantly improve the performance of the VQAs by guiding them towards viable subspaces and
increasing the likelihood of successfully navigating the optimization landscape.</p>
      <p>To achieve that, it is important to rely on computational solutions capable of making non-intuitive
decisions, exploring complex high-dimensional spaces, and identifying patterns and correlations that
are not immediately obvious. It is also desirable that these solutions are robust to noise and errors
while reducing the time and resources needed for empirical trial-and-error methods [3]. This is where
machine learning (ML) algorithms can play a decisive role. ML models can systematically provide
parameter estimations, eficiently search for optimal parameters, and ofer substantial improvements in
the execution of quantum algorithms.</p>
      <p>The problem has only recently received significant attention from the research community. Most
eforts have focused on using predictive ML models to initialize the parameters of quantum-classical
optimizers, such as the Quantum Approximate Optimization Algorithm (QAOA) [4, 2]. These approaches
typically rely on batch-style computation, where the data to train the model are all available at one time
and remain unchanged throughout the process. The predicted parameter values for the optimizer reflect
this assumption of data stability. However, in many real-world applications, data are not available
all at once, and especially in dynamic and time-varying environments, data arrive in sequence, their
properties may change [5, 6, 7] or may have some form of dependency or correlation with those of the
data that have previously arrived [8]. So, the reliability of an ML model trained on the data available
at the moment diminishes when it is later used on newly arrived data. To address these issues,
timevariability of data properties and correlation between data points should be explicitly considered when
training the model. Sequential ML models ofer a solution by building predictors that account for the
structure of sequential relationships between data, allowing to capture patterns and correlations over
time or in ordered data [9].</p>
    </sec>
    <sec id="sec-2">
      <title>2. Contribution</title>
      <p>In this work, we explore a paradigm shift in the design of VQAs by studying how ML can enhance the
construction of VQA models on sequential data. Specifically, we study how predictive ML techniques
can assist conventional optimizers in forecasting the parameter values of parameterized quantum
circuits. The goal is to help optimizers to achieve convergence or provide optimal values with fewer
iterations compared to using optimization alone. This benefit becomes even more significant when
conventional optimizers are applied to sequential data, as they would need to be executed from scratch
repeatedly, on newly acquired data, leading to time-consuming calculations. At each execution, the
optimizer might even face the Barren plateau problem.</p>
      <p>Our proposal stems from recognizing the intrinsic property of the correlation in sequential data.
Addressing that it is crucial not only for designing the appropriate machine learning algorithm to train
the forecasting model but also for improving the accuracy of the predicted parameters. This correlation
typically manifests in two forms: temporal locality and temporal recurrence [10, 11]. Temporal locality
suggests that recent data is more strongly correlated or predictive of future data compared to older
data. Temporal recurrence refers to the regular repetition of certain patterns or sequences over time.
While temporal locality emphasizes short-term influence, temporal recurrence focuses on patterns that
reappear at regular intervals over longer periods. Consequently, this correlation is mirrored in the
behavior of the optimizing function during exploration of the parameter space. Therefore, such function
replicates its response when these recurring patterns in the data emerge. Based on this, we propose
modeling the behavior of the optimizer based on the parameter values it generates while minimizing the
loss function, rather than focusing directly on the input data. The loss function in this work measures
the performance of a VQA, specifically in a binary classification task, by quantifying the error between
the class provided by the VQA and actual label. We conjecture that training the forecasting model
using the parameter values obtained during loss minimization could serve as a reliable strategy, ofering
valuable guidance to the optimizer for the future classification runs over sequential data.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <p>To combine both forms of correlation in a machine learning (ML) model capable of forecasting the
parameters for future optimizer runs, we propose a computational solution based on sequence modeling
neural networks for time-series forecasting, as illustrated in Figure 1. This solution is structured around
two interrelated methods.</p>
      <p>The first method addresses the binary classification task handled by a Variational Quantum Algorithm
(VQA) and operates directly on sequential data. The data arrive at regular intervals but in a scattered
manner, without a predefined order between labeled and unlabeled instances. As the data are received,
they are collected into a fixed-size data container, storing labeled data and unlabeled data. Using this
container helps alleviate the processing load caused by the data arrival rate. So, the first method is able
to train the binary classifier (through the VQA) from the labelled data and assigned either labels to
unlabelled data.</p>
      <p>During training, the VQA assigns the values predicted by the second method (forecasting model) to
its parameters. Then, it tries to refine them along a pre-defined number ( ) of iterations on the labelled
data. More precisely, we implemented two variants: the first one (afterwards,   + ) initializes the
parameters based on the forecasting model and then refines them over  iterations, the second one
(afterwards,  +   + ) runs a number of 2 iterations of the optimizer, then, calls for the forecasting
model, and, finally, once again runs a number of 2 iterations. The binary classifier so learned is therefore
applied to the unlabeled data.</p>
      <p>As an initialization for the entire process, we consider a number () of training sessions on the
labelled data accumulated in the container. Each initial training session runs for  optimizer iterations,
where  is chosen to be greater than  in order to build reliable parameter values, which will be used to
feed the forecasting model. Specifically, we chose  iterations (out of  ) in which the loss function of
the classifier (first method) is minimized. The rationale is to forecasting model learn the capabilities
of the optimizing function in the minimization of the objective function. Eventually, we obtain 
parameter configurations from each of the  training sessions, each configuration presents a number
 of values equal to the parameterized gates of the VQA.</p>
      <p>The forecasting model consists of  LSTM components, each assigned to one parameter and trained
on the values generated during the training sessions. Each LSTM processes subsequences extracted
from an overall sequence of  ×  time steps long ( time steps for each of the  training sessions).
Each subsequence contains  time steps, and the LSTM predics a single numeric values within the
range [0;2 ], based on the previous  time steps. Subsequences are formed by shifting the previous
subsequence by one time step. The LSTM learns by minimizing the prediction error over these shifted
subsequences.</p>
      <p>So, given a sequence ⟨1, 2, . . . , ×  ⟩ representing the entire time span from the training sessions,
we extract overlapping subsequences of length  by shifting each subsequence by one time step:
⟨1, 2, . . . ,  ⟩ → +1,
⟨2, 3, . . . , +1⟩ → +2, . . . ,
⟨(× )−  , . . . , (× )− 1⟩ → (× ).</p>
      <p>1 ∑︀=1 ∑︀=− 1 (︀ ^, − ,+ ︀) 2
The LSTM is trained by minimizing the mean squared error (MSE), between the predicted values ^,
and the true values ,+ :
Generally, speaking a LSTM neural network consists of units each comprising three kinds of gates,
forget gate, input gate, and output gate and keeps two model snapshots updated over processing time
stepped data, hidden state and cell state. Forget gate filters out irrelevant parts of the previous cell
state. Input gate decides what new information will be added to the cell state. Output gate determines
the next hidden state based on the updated cell state. The cell state represents the long-term memory
and is updated by combining the forget gate and input gate information. The hidden state represents
the short-term memory, which is used to make predictions. They operate at each time-step as here
formulated:
• Forget gate:  =  ( + ℎ− 1 +  )
• Input gate:  =  ( + ℎ− 1 + )
• Cell state update:  =  ⊙ − 1 +  ⊙ ˜ where ⊙ denotes element-wise multiplication,
˜ = tanh( + ℎ− 1 + ), ℎ− 1 is the previous hidden state vector
• Output gate:  =  ( + ℎ− 1 + ), with hidden state ℎ updated as: ℎ =  ⊙ tanh().
where,
•  is the number of units used in the whole model</p>
      <p>
        1
•  is the sigmoid activation function  () = 1+− 
• tanh() = +−−−  is the hyperbolic tangent activation function
• , ,  ∈ (
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ), ˜, ℎ(
        <xref ref-type="bibr" rid="ref1">− 1, 1</xref>
        ),  ∈ R
•  , , ,  ∈ R× 1 are weight matrices on the input values
•  , , ,  ∈ R×  are weight matrices on the hidden state
•  , , ,  ∈ R are bias vectors learned during training.
      </p>
      <p>
        After processing  time steps, the LSTM predicts the next numeric value ^ ∈ [0; 2 ]. This is done by
applying a linear transformation to the hidden state at the last time step  =  + 1
^+1 = ℎ  + 
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
where  ∈ R× 1 and  ∈ R are the output weight matrix and bias, respectively.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>The proposed solution has been implemented using IBM Qiskit [12], and is designed to work across a
broad range of VQAs without loss of generality. For the current experiments, we used the quantum
circuit shown in Fig. 2, which is commonly used as an ansatz in quantum machine learning approaches.
This circuit consists of 8 qubits and an equal number of gates per qubit. The design ensures that the
parameterized gates are influenced not only by the encoding of classical data but also by two-qubit
operations. The circuit includes 32 parameters to be optimized, corresponding to a forecasting model
comprising 32 LSTMs.</p>
      <p>The VQA leverages the pre-existing COBYLA optimizer for parameter tuning. For the forecasting
model, we tested two implementations: a classical version using a standard LSTM [13] (referred to as
LSTM) and a hybrid quantum-classical implementation [14] (referred to as QLSTM), both working on
the gradient descent-based ADAM optimization algorithm. We conducted experiments using the Aer
simulator on the real-world Spambase dataset, which contains 4600 data instances and 57 features1. A
classical feature selection technique was applied to reduce the feature set to eight.</p>
      <p>We simulated the scenario of incrementally incoming data by arranging the dataset in equally-sized
data blocks, which were acquired one a time and stored in the container. Each data block consisted of
75% labelled data and 25% unlabelled data. We considered three diferent sizes of data blocks 20, 40, 60,
corresponding to 230, 115, 77 data instances respectively. The number of initial training sessions 
was chosen as the portion of 25% of all data blocks, consequently we have  to 4, 8, 12. We defined 
(number of time-steps considered where the objective function takes lower values) as 4. The number
 of time steps was chosen as {2,3,4} for =4, {5,6,7,8} for =8, {9,10,11,12} for =12. Longer
subsequences (higher  ) are obtained with higher values of . Orthogonally, the number of iterations 
was set to 20, while the number of the iterations  is 8 when  is 20. To clarify, for the setting =4
 =4   +   =20, we trained the binary classifier over 4 training sessions each using 75% of 230
labelled data, the VQA works on 20 iterations during each training session, the forecasting model learns
on subsequences of 5 ( +1) time steps, extracted from an overall sequence of 4 × 4 ( × ) time-steps
long. Once the  data blocks were processed, for the subsequent data blocks, the parameters of the
VQA are either i) initialized using the forecasting model and refined through 8 iterations in case of
  + , or tuned over 4 iterations, whose  values feed the forecasting model and finally another
refinement round of 4 iterations, in case of  +   + .</p>
      <p>Experiments were conducted to evaluate the accuracy in a binary classification task and measure the
running times for diferent lengths,  , of the training sub-sequences. Table 1 presents the F1-scores,
calculated as the average across individual values derived from the data blocks. The results show that the
introduction of a forecasting model generally improves the predictive accuracy of binary classification
for all values of  , except for  =10 (in bold).</p>
      <p>It is important to note that not all predictors perform consistently. Specifically, the model LSTM
For+It =8 shows improvements for  values in the range {2, 3, 4, 5, 6, 7, 8}, whereas the model
QLSTM For+It =8 performs better for longer sub-sequences, specifically {9,11,12}. This suggests that
the classical LSTM model performs well on short to medium-length sub-sequences, while the quantum
LSTM model is more efective for those longer.</p>
      <p>Another observation concerns the model variants For+It and It+For+It, which define how the VQA
leverages the forecasting results. In all trials, the models using It+For+It consistently performed worse
than their counterparts, both for LSTM and QLSTM. This underperformance may be attributed to
the initial 2 iterations, which, starting from a uniform setting, do not provide suficient input for the</p>
      <sec id="sec-4-1">
        <title>1https://archive.ics.uci.edu/ml/datasets/Spambase</title>
        <p>VQA</p>
        <p>LSTM   +  =8
LSTM  +   +  =8</p>
        <p>QLSTM   +  =8
QLSTM  +   +  =8</p>
        <p>forecasting model to enhance performance.</p>
        <p>In Figure 3, we see the running times of three main approaches executed on a simulator average
at diferent values of  (length of the subsequences). The predictors require more times only at the
beginning, corresponding to the training sessions . Contrarily, they ask for less time consumption for
all the remaining data blocks, while saving time calculation by several hundreds. We observe that this
happens with both LSTM models and QLSTM models.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>Alleviating the workload of VQAs is a challenging and attractive research area aimed at developing
solutions that enable a broader range of practical applications demonstrating quantum advantages,
including quantum machine learning. In this work, we explore a new perspective in VQA design,
where the estimation of parameter values—traditionally handled using optimization techniques—can be
supported by predictive machine learning. The usefulness of this approach becomes even more evident
when the VQA is required to operate repeatedly, as in the case of continuous data flows. This work
focuses on simulation-based systems, without considering noise-related issues.</p>
      <p>While our experiments are limited to a single case study and are not exhaustive, the results are
encouraging. They suggest that integrating forecasting models into VQAs improves both the quality and
efectiveness of the algorithms. However, not all predictors and experimental configurations guarantee
this improvement, a topic we will further investigate. For instance, we plan to study the interactions
between parameters in multivariate time-series data, such as with LSTM or QLSTM models.
Acknowledgment</p>
      <p>Corrado Loglisci and Marjana Skenduli acknowledge the financial support from the project "PNRR
(a)  = 4
(b)  = 8
(c)  = 12</p>
      <sec id="sec-5-1">
        <title>MUR project PE0000023-NQSTI" for this research.</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>References</title>
      <p>learning for parameter prediction of quantum approximate optimization algorithm, arXiv preprint
arXiv:2403.03310 (2024).
[3] J. Falla, Q. Langfitt, Y. Alexeev, I. Safro, Graph representation learning for parameter transferability
in quantum approximate optimization algorithm, Quantum Machine Intelligence 6 (2024) 46.
[4] F. Meng, X. Zhou, Parameter generation of quantum approximate optimization algorithm with
difusion model, arXiv preprint arXiv:2407.12242 (2024).
[5] C. Loglisci, D. Malerba, S. Pascazio, Quarta: quantum supervised and unsupervised learning
for binary classification in domain-incremental learning, Quantum Mach. Intell. 6 (2024) 68.
doi:10.1007/s42484-024-00196-7.
[6] G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, S. Wermter, Continual lifelong learning with neural
networks: A review, Neural Networks 113 (2019) 54–71. doi:10.1016/j.neunet.2019.01.012.
[7] C. Loglisci, D. Malerba, Coupling quantum classification and quantum distance estimation in
continual learning, in: AIQxQIA@AI*IA, volume 3586 of CEUR Workshop Proceedings,
CEURWS.org, 2023.
[8] H. Homayouni, S. Ghosh, I. Ray, S. Gondalia, J. Duggan, M. G. Kahn, An autocorrelation-based
lstmautoencoder for anomaly detection on time-series data, in: 2020 IEEE International Conference on
Big Data (Big Data), 2020, pp. 5068–5077. doi:10.1109/BigData50022.2020.9378192.
[9] Z. Xie, Y. Yang, Y. Zhang, J. Wang, S. Du, Deep learning on multi-view sequential data: a survey,</p>
      <p>Artif. Intell. Rev. 56 (2023) 6661–6704.
[10] R. A. Rossi, Relational time series forecasting, The Knowledge Engineering Review 33 (2018) e1.</p>
      <p>doi:10.1017/S0269888918000024.
[11] C. Loglisci, I. Diliso, D. Malerba, A hybrid quantum-classical framework for binary classification
in online learning, in: SEBD, volume 3478 of CEUR Workshop Proceedings, CEUR-WS.org, 2023, pp.
88–99.
[12] M. S. Anis, et al, Qiskit: An open-source framework for quantum computing, 2021. doi:10.5281/
zenodo.2573505.
[13] M. Abadi, et al., Tensorflow: Large-scale machine learning on heterogeneous systems, https:
//www.tensorflow.org/, 2015.
[14] S. Y.-C. Chen, S. Yoo, Y.-L. L. Fang, Quantum long short-term memory, 2020. URL: https://arxiv.
org/abs/2009.01783. arXiv:2009.01783.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Madsen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Laudenbach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Askarani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Rortais</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Vincent</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bulmer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Miatto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Neuhaus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Helt</surname>
          </string-name>
          , M. Collins,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lita</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Gerrits</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. W.</given-names>
            <surname>Nam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Vaidya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Menotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Dhand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Vernon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Quesada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lavoie</surname>
          </string-name>
          ,
          <article-title>Quantum computational advantage with a programmable photonic processor (</article-title>
          <year>2022</year>
          ). doi:https://doi.org/10.1038/s41586-022-04725-x.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liang</surname>
          </string-name>
          , G. Liu,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          , J. Cheng, T. Hao, K. Liu,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ye</surname>
          </string-name>
          , et al.,
          <source>Graph</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>