1. Introduction

Parameter Prediction for Variational Quantum Algorithms through Sequence Modeling⋆

Corrado Loglisci

Vito Losavio

Berardina De Carolis

Marjana Skenduli

Donato Malerba

0 0 Department of Computer Science, Universita' degli Studi di Bari 'Aldo Moro' , Bari , Italy

NISQ devices represent a major step in quantum computing, though they face limitations such as short coherence times and error rates. Variational Quantum Algorithms (VQAs) help overcome these limitations by optimizing parameterized quantum circuits, but challenges like the barren plateau problem and the time spent in iterative procedures remain. Predictive machine learning can improve VQAs by providing eficient parameter initialization and estimation. However, traditional ML models for parameter prediction assume static data and batch processing, which is problematic in dynamic environments. Sequential ML models address this by adapting the predictor to timevarying data and capturing correlations over sequences of data, improving the reliability of predictions of the parameters over time and reducing the time consumption that a conventional VQA would require.

eol>Variational quantum algorithms Parameterized quantum circuits Parameter estimation Sequence modeling

1. Introduction

, can significantly improve the performance of the VQAs by guiding them towards viable subspaces and increasing the likelihood of successfully navigating the optimization landscape.

To achieve that, it is important to rely on computational solutions capable of making non-intuitive decisions, exploring complex high-dimensional spaces, and identifying patterns and correlations that are not immediately obvious. It is also desirable that these solutions are robust to noise and errors while reducing the time and resources needed for empirical trial-and-error methods [3]. This is where machine learning (ML) algorithms can play a decisive role. ML models can systematically provide parameter estimations, eficiently search for optimal parameters, and ofer substantial improvements in the execution of quantum algorithms.

The problem has only recently received significant attention from the research community. Most eforts have focused on using predictive ML models to initialize the parameters of quantum-classical optimizers, such as the Quantum Approximate Optimization Algorithm (QAOA) [4, 2]. These approaches typically rely on batch-style computation, where the data to train the model are all available at one time and remain unchanged throughout the process. The predicted parameter values for the optimizer reflect this assumption of data stability. However, in many real-world applications, data are not available all at once, and especially in dynamic and time-varying environments, data arrive in sequence, their properties may change [5, 6, 7] or may have some form of dependency or correlation with those of the data that have previously arrived [8]. So, the reliability of an ML model trained on the data available at the moment diminishes when it is later used on newly arrived data. To address these issues, timevariability of data properties and correlation between data points should be explicitly considered when training the model. Sequential ML models ofer a solution by building predictors that account for the structure of sequential relationships between data, allowing to capture patterns and correlations over time or in ordered data [9].

2. Contribution

In this work, we explore a paradigm shift in the design of VQAs by studying how ML can enhance the construction of VQA models on sequential data. Specifically, we study how predictive ML techniques can assist conventional optimizers in forecasting the parameter values of parameterized quantum circuits. The goal is to help optimizers to achieve convergence or provide optimal values with fewer iterations compared to using optimization alone. This benefit becomes even more significant when conventional optimizers are applied to sequential data, as they would need to be executed from scratch repeatedly, on newly acquired data, leading to time-consuming calculations. At each execution, the optimizer might even face the Barren plateau problem.

Our proposal stems from recognizing the intrinsic property of the correlation in sequential data. Addressing that it is crucial not only for designing the appropriate machine learning algorithm to train the forecasting model but also for improving the accuracy of the predicted parameters. This correlation typically manifests in two forms: temporal locality and temporal recurrence [10, 11]. Temporal locality suggests that recent data is more strongly correlated or predictive of future data compared to older data. Temporal recurrence refers to the regular repetition of certain patterns or sequences over time. While temporal locality emphasizes short-term influence, temporal recurrence focuses on patterns that reappear at regular intervals over longer periods. Consequently, this correlation is mirrored in the behavior of the optimizing function during exploration of the parameter space. Therefore, such function replicates its response when these recurring patterns in the data emerge. Based on this, we propose modeling the behavior of the optimizer based on the parameter values it generates while minimizing the loss function, rather than focusing directly on the input data. The loss function in this work measures the performance of a VQA, specifically in a binary classification task, by quantifying the error between the class provided by the VQA and actual label. We conjecture that training the forecasting model using the parameter values obtained during loss minimization could serve as a reliable strategy, ofering valuable guidance to the optimizer for the future classification runs over sequential data.

3. Method

To combine both forms of correlation in a machine learning (ML) model capable of forecasting the parameters for future optimizer runs, we propose a computational solution based on sequence modeling neural networks for time-series forecasting, as illustrated in Figure 1. This solution is structured around two interrelated methods.

The first method addresses the binary classification task handled by a Variational Quantum Algorithm (VQA) and operates directly on sequential data. The data arrive at regular intervals but in a scattered manner, without a predefined order between labeled and unlabeled instances. As the data are received, they are collected into a fixed-size data container, storing labeled data and unlabeled data. Using this container helps alleviate the processing load caused by the data arrival rate. So, the first method is able to train the binary classifier (through the VQA) from the labelled data and assigned either labels to unlabelled data.

During training, the VQA assigns the values predicted by the second method (forecasting model) to its parameters. Then, it tries to refine them along a pre-defined number ( ) of iterations on the labelled data. More precisely, we implemented two variants: the first one (afterwards, + ) initializes the parameters based on the forecasting model and then refines them over iterations, the second one (afterwards, + + ) runs a number of 2 iterations of the optimizer, then, calls for the forecasting model, and, finally, once again runs a number of 2 iterations. The binary classifier so learned is therefore applied to the unlabeled data.

As an initialization for the entire process, we consider a number () of training sessions on the labelled data accumulated in the container. Each initial training session runs for optimizer iterations, where is chosen to be greater than in order to build reliable parameter values, which will be used to feed the forecasting model. Specifically, we chose iterations (out of ) in which the loss function of the classifier (first method) is minimized. The rationale is to forecasting model learn the capabilities of the optimizing function in the minimization of the objective function. Eventually, we obtain parameter configurations from each of the training sessions, each configuration presents a number of values equal to the parameterized gates of the VQA.

The forecasting model consists of LSTM components, each assigned to one parameter and trained on the values generated during the training sessions. Each LSTM processes subsequences extracted from an overall sequence of × time steps long ( time steps for each of the training sessions). Each subsequence contains time steps, and the LSTM predics a single numeric values within the range [0;2 ], based on the previous time steps. Subsequences are formed by shifting the previous subsequence by one time step. The LSTM learns by minimizing the prediction error over these shifted subsequences.

So, given a sequence ⟨1, 2, . . . , × ⟩ representing the entire time span from the training sessions, we extract overlapping subsequences of length by shifting each subsequence by one time step: ⟨1, 2, . . . , ⟩ → +1, ⟨2, 3, . . . , +1⟩ → +2, . . . , ⟨(× )− , . . . , (× )− 1⟩ → (× ).

1 ∑︀=1 ∑︀=− 1 (︀ ^, − ,+ ︀) 2 The LSTM is trained by minimizing the mean squared error (MSE), between the predicted values ^, and the true values ,+ : Generally, speaking a LSTM neural network consists of units each comprising three kinds of gates, forget gate, input gate, and output gate and keeps two model snapshots updated over processing time stepped data, hidden state and cell state. Forget gate filters out irrelevant parts of the previous cell state. Input gate decides what new information will be added to the cell state. Output gate determines the next hidden state based on the updated cell state. The cell state represents the long-term memory and is updated by combining the forget gate and input gate information. The hidden state represents the short-term memory, which is used to make predictions. They operate at each time-step as here formulated: • Forget gate: = ( + ℎ− 1 + ) • Input gate: = ( + ℎ− 1 + ) • Cell state update: = ⊙ − 1 + ⊙ ˜ where ⊙ denotes element-wise multiplication, ˜ = tanh( + ℎ− 1 + ), ℎ− 1 is the previous hidden state vector • Output gate: = ( + ℎ− 1 + ), with hidden state ℎ updated as: ℎ = ⊙ tanh(). where, • is the number of units used in the whole model

1 • is the sigmoid activation function () = 1+− • tanh() = +−−− is the hyperbolic tangent activation function • , , ∈ ( 0, 1 ), ˜, ℎ( − 1, 1 ), ∈ R • , , , ∈ R× 1 are weight matrices on the input values • , , , ∈ R× are weight matrices on the hidden state • , , , ∈ R are bias vectors learned during training.

After processing time steps, the LSTM predicts the next numeric value ^ ∈ [0; 2 ]. This is done by applying a linear transformation to the hidden state at the last time step = + 1 ^+1 = ℎ + ( 1 ) where ∈ R× 1 and ∈ R are the output weight matrix and bias, respectively.

4. Experiments

The proposed solution has been implemented using IBM Qiskit [12], and is designed to work across a broad range of VQAs without loss of generality. For the current experiments, we used the quantum circuit shown in Fig. 2, which is commonly used as an ansatz in quantum machine learning approaches. This circuit consists of 8 qubits and an equal number of gates per qubit. The design ensures that the parameterized gates are influenced not only by the encoding of classical data but also by two-qubit operations. The circuit includes 32 parameters to be optimized, corresponding to a forecasting model comprising 32 LSTMs.

The VQA leverages the pre-existing COBYLA optimizer for parameter tuning. For the forecasting model, we tested two implementations: a classical version using a standard LSTM [13] (referred to as LSTM) and a hybrid quantum-classical implementation [14] (referred to as QLSTM), both working on the gradient descent-based ADAM optimization algorithm. We conducted experiments using the Aer simulator on the real-world Spambase dataset, which contains 4600 data instances and 57 features1. A classical feature selection technique was applied to reduce the feature set to eight.

We simulated the scenario of incrementally incoming data by arranging the dataset in equally-sized data blocks, which were acquired one a time and stored in the container. Each data block consisted of 75% labelled data and 25% unlabelled data. We considered three diferent sizes of data blocks 20, 40, 60, corresponding to 230, 115, 77 data instances respectively. The number of initial training sessions was chosen as the portion of 25% of all data blocks, consequently we have to 4, 8, 12. We defined (number of time-steps considered where the objective function takes lower values) as 4. The number of time steps was chosen as {2,3,4} for =4, {5,6,7,8} for =8, {9,10,11,12} for =12. Longer subsequences (higher ) are obtained with higher values of . Orthogonally, the number of iterations was set to 20, while the number of the iterations is 8 when is 20. To clarify, for the setting =4 =4 + =20, we trained the binary classifier over 4 training sessions each using 75% of 230 labelled data, the VQA works on 20 iterations during each training session, the forecasting model learns on subsequences of 5 ( +1) time steps, extracted from an overall sequence of 4 × 4 ( × ) time-steps long. Once the data blocks were processed, for the subsequent data blocks, the parameters of the VQA are either i) initialized using the forecasting model and refined through 8 iterations in case of + , or tuned over 4 iterations, whose values feed the forecasting model and finally another refinement round of 4 iterations, in case of + + .

Experiments were conducted to evaluate the accuracy in a binary classification task and measure the running times for diferent lengths, , of the training sub-sequences. Table 1 presents the F1-scores, calculated as the average across individual values derived from the data blocks. The results show that the introduction of a forecasting model generally improves the predictive accuracy of binary classification for all values of , except for =10 (in bold).

It is important to note that not all predictors perform consistently. Specifically, the model LSTM For+It =8 shows improvements for values in the range {2, 3, 4, 5, 6, 7, 8}, whereas the model QLSTM For+It =8 performs better for longer sub-sequences, specifically {9,11,12}. This suggests that the classical LSTM model performs well on short to medium-length sub-sequences, while the quantum LSTM model is more efective for those longer.

Another observation concerns the model variants For+It and It+For+It, which define how the VQA leverages the forecasting results. In all trials, the models using It+For+It consistently performed worse than their counterparts, both for LSTM and QLSTM. This underperformance may be attributed to the initial 2 iterations, which, starting from a uniform setting, do not provide suficient input for the

1https://archive.ics.uci.edu/ml/datasets/Spambase

VQA

LSTM + =8 LSTM + + =8

QLSTM + =8 QLSTM + + =8

forecasting model to enhance performance.

In Figure 3, we see the running times of three main approaches executed on a simulator average at diferent values of (length of the subsequences). The predictors require more times only at the beginning, corresponding to the training sessions . Contrarily, they ask for less time consumption for all the remaining data blocks, while saving time calculation by several hundreds. We observe that this happens with both LSTM models and QLSTM models.

5. Conclusions

Alleviating the workload of VQAs is a challenging and attractive research area aimed at developing solutions that enable a broader range of practical applications demonstrating quantum advantages, including quantum machine learning. In this work, we explore a new perspective in VQA design, where the estimation of parameter values—traditionally handled using optimization techniques—can be supported by predictive machine learning. The usefulness of this approach becomes even more evident when the VQA is required to operate repeatedly, as in the case of continuous data flows. This work focuses on simulation-based systems, without considering noise-related issues.

While our experiments are limited to a single case study and are not exhaustive, the results are encouraging. They suggest that integrating forecasting models into VQAs improves both the quality and efectiveness of the algorithms. However, not all predictors and experimental configurations guarantee this improvement, a topic we will further investigate. For instance, we plan to study the interactions between parameters in multivariate time-series data, such as with LSTM or QLSTM models. Acknowledgment

Corrado Loglisci and Marjana Skenduli acknowledge the financial support from the project "PNRR (a) = 4 (b) = 8 (c) = 12

MUR project PE0000023-NQSTI" for this research. References

learning for parameter prediction of quantum approximate optimization algorithm, arXiv preprint arXiv:2403.03310 (2024). [3] J. Falla, Q. Langfitt, Y. Alexeev, I. Safro, Graph representation learning for parameter transferability in quantum approximate optimization algorithm, Quantum Machine Intelligence 6 (2024) 46. [4] F. Meng, X. Zhou, Parameter generation of quantum approximate optimization algorithm with difusion model, arXiv preprint arXiv:2407.12242 (2024). [5] C. Loglisci, D. Malerba, S. Pascazio, Quarta: quantum supervised and unsupervised learning for binary classification in domain-incremental learning, Quantum Mach. Intell. 6 (2024) 68. doi:10.1007/s42484-024-00196-7. [6] G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, S. Wermter, Continual lifelong learning with neural networks: A review, Neural Networks 113 (2019) 54–71. doi:10.1016/j.neunet.2019.01.012. [7] C. Loglisci, D. Malerba, Coupling quantum classification and quantum distance estimation in continual learning, in: AIQxQIA@AI*IA, volume 3586 of CEUR Workshop Proceedings, CEURWS.org, 2023. [8] H. Homayouni, S. Ghosh, I. Ray, S. Gondalia, J. Duggan, M. G. Kahn, An autocorrelation-based lstmautoencoder for anomaly detection on time-series data, in: 2020 IEEE International Conference on Big Data (Big Data), 2020, pp. 5068–5077. doi:10.1109/BigData50022.2020.9378192. [9] Z. Xie, Y. Yang, Y. Zhang, J. Wang, S. Du, Deep learning on multi-view sequential data: a survey,

Artif. Intell. Rev. 56 (2023) 6661–6704. [10] R. A. Rossi, Relational time series forecasting, The Knowledge Engineering Review 33 (2018) e1.

doi:10.1017/S0269888918000024. [11] C. Loglisci, I. Diliso, D. Malerba, A hybrid quantum-classical framework for binary classification in online learning, in: SEBD, volume 3478 of CEUR Workshop Proceedings, CEUR-WS.org, 2023, pp. 88–99. [12] M. S. Anis, et al, Qiskit: An open-source framework for quantum computing, 2021. doi:10.5281/ zenodo.2573505. [13] M. Abadi, et al., Tensorflow: Large-scale machine learning on heterogeneous systems, https: //www.tensorflow.org/, 2015. [14] S. Y.-C. Chen, S. Yoo, Y.-L. L. Fang, Quantum long short-term memory, 2020. URL: https://arxiv. org/abs/2009.01783. arXiv:2009.01783.

[1]

Madsen ,

Laudenbach ,

Askarani ,

Rortais ,

Vincent ,

Bulmer ,

Miatto ,

Neuhaus ,

Helt , M. Collins,

Lita ,

Gerrits ,

S. W.

Nam ,

Vaidya ,

Menotti ,

Dhand ,

Vernon ,

Quesada ,

Lavoie , Quantum computational advantage with a programmable photonic processor ( 2022 ). doi:https://doi.org/10.1038/s41586-022-04725-x.

[2]

Liang , G. Liu,

Liu , J. Cheng, T. Hao, K. Liu,

Ren ,

Song ,

Liu ,

Ye , et al., Graph