<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>AIQ x QIA</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Reinforcement Learning for Variational Quantum Circuit Design</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Simone Foderà</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gloria Turati</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Riccardo Nembrini</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maurizio Ferrari Dacrema</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Paolo Cremonesi</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ludwig Maximilians Universität</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Politecnico di Milano</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>2</volume>
      <fpage>0000</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>Variational Quantum Algorithms (VQAs) are widely used for solving optimization problems in the Noisy Intermediate-Scale Quantum (NISQ) era. However, designing efective quantum circuits (ansatzes) that are compatible with the limitations of current quantum hardware remains a significant challenge. In this work, we introduce a Reinforcement Learning (RL) agent that autonomously generates ansatzes for VQAs. The RL agent is trained on several optimization problems, including Maximum Cut, Maximum Clique, and Minimum Vertex Cover, across diferent graph topologies. Our results show that the agent is able to generate efective quantum circuits, with approximation ratios that favorably compare to commonly used ansatzes. Additionally, we identify a novel family of ansatzes, termed “-connected”, particularly efective on Maximum Cut problems. These ifndings highlight the potential of RL techniques in designing eficient quantum circuits for a broad class of applications in quantum computing.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Variational Quantum Algorithms</kwd>
        <kwd>Reinforcement Learning</kwd>
        <kwd>Ansatz</kwd>
        <kwd>Quantum Computing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Quantum computing has gained attention for its potential to tackle problems that are computationally
challenging for classical computers. Variational Quantum Algorithms (VQAs) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which employ a
parametric quantum circuit and a classical optimizer to adjust the circuit parameters, represent a
promising paradigm in the Noisy Intermediate-Scale Quantum (NISQ) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] era. However, a key challenge
in these algorithms lies in identifying an appropriate ansatz for the specific problem being addressed.
Approaches for selecting suitable ansatzes include leveraging problem-specific properties, such as
symmetries [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3–5</xref>
        ], or using adaptive methods that dynamically modify circuits by adding and removing
gates during execution [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6–8</xref>
        ]. However, identifying which problem properties can be exploited is
non-trivial and adaptive methods rely on carefully designed heuristics and may require numerous
circuit executions to converge to a suitable ansatz.
      </p>
      <p>Reinforcement Learning (RL) is a Machine Learning paradigm where an agent learns to perform
actions to achieve a desired goal. By receiving a reward for each action taken, the agent iteratively
refines its strategy to maximize the cumulative reward, ultimately achieving the desired objective. The
RL paradigm is highly flexible and has already been successfully deployed for many tasks in quantum
computing [9–12], including circuit design [13, 14]. One crucial diference between the RL paradigm
and the existing adaptive approaches is that it is possible with RL to learn how to build circuits without
relying on manually designed heuristics or domain-specific knowledge. Furthermore, RL can be efective
in scenarios with very large solution spaces, such as the one of possible quantum circuits.</p>
      <p>In our work we present a RL-based algorithm designed to search for variational quantum circuits for
solving optimization problems. We train the agent on instances of the Maximum Cut, Maximum Clique,
and Minimum Vertex Cover problems and analyze the solution quality and the properties of the circuits
obtained. Our findings show that the agent is capable of building circuits leading to satisfactory results,
particularly on the Maximum Cut problem.</p>
      <p>Moreover, during training on the Maximum Cut problem, the agent discovered efective circuits with
a regular structure, which we denote as -connected, characterized by circuits with only  gates
connecting all the qubits. We test a specific member of this family, referred to as Linear circuit, by
comparing it to state-of-the-art quantum algorithms on a range of Quadratic Unconstrained Binary
Optimization (QUBO) problems with varying underlining graph topologies and sizes. Our findings
reveal that the Linear circuit is able to find high-quality solutions for the Maximum Cut problem, but it
is less efective for the other tasks. Finally, we discuss how -connected could be implemented on
the hardware.</p>
      <p>In summary, our contributions are the following:
• we introduce a Reinforcement Learning agent designed to autonomously generate Variational</p>
      <p>Quantum Circuits for solving optimization problems;
• we show that the RL-generated circuits achieve high approximation ratios across a variety of
optimization problems;
• we identify and analyze a novel family of circuits discovered by the agent, termed -connected,
which generalizes efectively and can be applied to other instances of the same problem.</p>
      <p>Overall, our study shows the potential of RL-based approaches for constructing efective variational
quantum circuits. Moreover, this methodology holds promise for broader applications within quantum
computing, including designing more general circuits or optimizing circuit properties.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Variational Quantum Algorithms</title>
      <p>
        Variational Quantum Algorithms (VQAs) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] represent a class of hybrid quantum algorithms that can
be used to tackle certain optimization problems and are characterized by the presence of a parametric
quantum circuit, denoted as ansatz, and a classical optimizer. The goal of the classical optimizer is to
adjust the circuit parameters by minimizing a predefined cost function, in such a way that the execution
of the circuit with optimized parameters yields the solution to the given optimization problem. In
recent years, VQAs have known large difusion due to their potential to tackle complex problems
using near-term quantum devices. Indeed, the variational approach enables the utilization of shallower
circuits, making these algorithms more resilient to noise and qubit decoherence.
      </p>
      <p>One of the most widely utilized VQAs is the Variational Quantum Eigensolver (VQE) [15, 16], which
allows to determine the ground state of a given Hamiltonian , and is largely applied in quantum
chemistry [17, 18]. Specifically, this algorithm employs a suitable ansatz capable of generating the
ifnal parametric state | ( )⟩, where  denotes a parameter vector. The objective is to identify the
circuit parameters that minimize the cost function ⟨ ( )|| ( )⟩, also denoted as ⟨⟩, through a
variational approach. Notice that, when using real quantum hardware, we do not have direct access to
the expectation ⟨ ( )|| ( )⟩, but this quantity can be estimated by executing the circuit a number
shots of times and computing the quantity:
⟨⟩* = ⟨ ( )|| ( )⟩* =
shots
∑︁ ⟨ ˜ || ˜ ⟩,
1
shots =1
(1)
where | ˜ ⟩ denotes the basis state obtained after performing the measurement at the -th execution of
the circuit.</p>
      <p>Another widely adopted VQA is the Quantum Approximate Optimization Algorithm (QAOA) [19, 20],
often employed to address combinatorial optimization problems [21–27]. Similarly to VQE, QAOA
aims to minimize the expectation value of a designated cost Hamiltonian on the circuit’s final state.
However, QAOA employs a predefined circuit comprising an initial layer of Hadamard gates followed
by  layers, each implementing two parametric operators: the cost operator, which depends on the cost
Hamiltonian, and the mixer operator, which allows to further explore the solution space. QAOA can be
considered as a discretized form of continuous-time quantum evolution.</p>
      <p>One variant of QAOA is the multi-angle QAOA (ma-QAOA) [28], which shares the same objective
and circuit structure as QAOA, but uses distinct parameters for each parametric gate associated with
the cost and mixer operators. Whereas ma-QAOA enhances space exploration capability using the same
circuit structure and depth as QAOA, it also increases the computational load on the classical optimizer
due to the higher number of parameters, rendering a careful trade-of necessary.</p>
      <p>Another variant, QAOA+ [29], extends the ansatz used by QAOA with  = 1 by introducing an
additional problem-independent layer consisting of  gates, and a mixer layer with  gates. This
results in a circuit with Hadamard, cost and mixer layer, and the two newly introduced components.
Notably, this structure is not repeated. Since each gate in the additional layers is independently
parameterized, the QAOA+ ansatz results in a deeper circuit with 2− 1 additional parameters compared
to QAOA with  = 1. Similar to ma-QAOA, the higher number of parameters enables a broader
exploration of the solution space, but at the same time increases the computational burden on the
classical optimizer.</p>
      <p>
        However, one critical challenge in the application of VQAs is the identification of suitable circuits for
addressing specific problems [ 30–34]. An ideal ansatz should have a limited number of gates and depth
to mitigate noise and decoherence, utilize the native gates of the hardware to simplify implementation
and reduce circuit depth, as well as efectively sample the correct solution to the optimization problem.
In particular, one significant obstacle to finding the correct solution to the optimization problem lies
in the phenomenon of barren plateaus [35–41]. Barren plateaus occur when the gradient of the cost
function exponentially vanishes as the system size increases, resulting in a flat landscape in which the
classical optimizer struggles to escape local optima and may not be able to find good parameters for the
circuit. To address this challenge, strategies for selecting suitable ansatzes have been explored. These
include leveraging problem-specific properties [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3–5, 19</xref>
        ] and employing adaptive algorithms [
        <xref ref-type="bibr" rid="ref6 ref7 ref8">6–8</xref>
        ].
      </p>
      <p>Adaptive Variational Quantum Algorithms are a family of methods which dynamically construct the
quantum circuit by iteratively adding and removing gates throughout the algorithm execution. This
approach allows to explore various gate configurations and choose the most suitable circuit structures
for a given problem. One category of adaptive VQAs, proposed in the literature, includes ADAPT-VQE
[42], qubit-ADAPT-VQE [43], QEB-ADAPT-VQE [44], and Overlap-ADAPT-VQE [45], adaptive variants
of VQE. These algorithms aim to find the ground state of a Hamiltonian operator representing the
energy of a molecule, and their distinguishing feature lies in the construction of the ansatz, which
is achieved by adding gates selected from a pool which depends on the specific chemistry problem
being addressed. Additionally, there exists an adaptive version of QAOA [46], which, at each iteration,
determines the most appropriate mixer for the following layer of the circuit. Other adaptive VQAs
adopt a genetic approach [47–49], employ more generalized Machine Learning techniques for circuit
construction [50–52] or make use of Reinforcement Learning as described in Section 3.1.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Reinforcement Learning</title>
      <p>Our objective is to use a Machine Learning model to design new circuits capable of finding the ground
state of a Hamiltonian. This problem requires exploring a large solution space of potential circuits,
and is strongly influenced by the specific Hamiltonian under consideration. Consequently, creating
an exhaustive dataset of quantum circuits for supervised learning is impractical. Thus, we choose
to use a Reinforcement Learning (RL) approach, where an agent interacts with an environment to
achieve a specific goal. This enables us to generate data, i.e. quantum circuits, and evaluate their
performance during training. Because of the complexity of this topic, in this section we only describe
the fundamental concepts of RL and the selected algorithm. For a comprehensive overview of this</p>
      <sec id="sec-3-1">
        <title>Agent</title>
        <p />
      </sec>
      <sec id="sec-3-2">
        <title>Environment</title>
        <p>...</p>
        <p>technology, the interested reader may refer to [53].</p>
        <p>In RL, the agent interacts with the environment in discrete time steps. At each step , the environment
is in a particular state , which is observed by the agent. Based on this observation, the agent performs
an action , which may have an efect on the environment, causing the transition to a new state +1.
The agent receives a reward , a value depending on the quality of the action taken (see Fig. 1). Starting
from an initial state and cyclically interacting with the environment until a termination condition is
reached, the agent completes an episode. Let us consider a practical example, where the agent is a
robot whose goal is to move an object from one place to another in an environment constituted by a
table. In this scenario, the actions correspond to the movements that the robot’s mechanical arms can
perform, the state of the environment is represented by the current position of the object on the table,
and the reward is a function which increases as the distance between the object and its target location
decreases. An episode starts with the object in an initial position on the table and ends when the robot
has successfully placed the object at the target position.</p>
        <p>Spanning through multiple episodes, the objective of the agent is to learn which actions to perform
in order to obtain the highest cumulative reward, or return. Specifically, the return at time step  is the
weighted sum of rewards starting from  until the end of the episode:

=0
 = ∑︁  ++1
where  is the number of remaining steps until the end of the episode and  ∈ (0, 1) represents the
discount factor. This factor serves two purposes: facilitating convergence and balancing short-term
versus long-term rewards.</p>
        <p>When choosing a new action in each state, the agent follows a policy  , which is a probability
distribution on the set of all actions, conditioned on the state:</p>
        <p>(|) =  ( =  |  = )</p>
        <p>Various methods exist to learn an optimal policy. In this work we use Proximal Policy Optimization
(PPO), which is currently a state-of-the-art Deep Reinforcement Learning algorithm introduced by
(2)
(3)
OpenAI [54]. PPO makes use of two neural networks with the same structure, respectively called policy
and value network (Fig. 2). Both networks receive an appropriate representation of the environment’s
state as input. The policy network then outputs a probability distribution on the possible actions at
time step , from which action  is then sampled. On the other hand, the value network outputs a
single value estimating the value function, which quantifies the quality of the state in terms of expected
return when following the policy  :</p>
        <p>() = E [  |  =  ]
 (, ) =  (, ) −  ()
While the policy network efectively chooses the actions to perform, the value network is used to
estimate the so-called advantage function:
Here,  represents the state-action value function, which is the expected return obtained by starting in
state , taking action  and then following the policy  :
 (, ) = E [  |  = ,  =  ]
(6)
Thus, the advantage measures the potential benefit of taking action  in state , independently of the
quality of that state. In order to learn an optimal policy, PPO performs gradient ascent with the goal of
maximizing an objective function that incorporates both the advantage and a loss that helps the value
network in learning how to better estimate the real value function. Moreover, a clipping mechanism is
implemented to avoid a too large update to the policy during training. For further details on the loss
and implementation, we refer to the original articles [54, 55].</p>
        <sec id="sec-3-2-1">
          <title>3.1. Reinforcement Learning for Quantum Computing</title>
          <p>In recent years, RL techniques have been applied to various challenging tasks in quantum computing.
Some studies focus on optimizing circuit parameters [9–11], while others tackle circuit learning tasks
consisting in generating quantum circuits to transform an initial state into a target state [56–60]. This
task is particularly useful because it can be applied to the crucial step of quantum circuit compiling.
Additionally, RL has been used to learn optimization strategies aimed at reducing circuit depth and gate
count [61]. Furthermore, RL algorithms have been applied to the task of designing quantum circuits
tailored for Machine Learning [13] and optimization [14] problems. Specifically, [ 13] discusses the use
of RL in designing parameterized quantum circuits for classification tasks, while [ 14] employs RL to
ifnd suitable ansatzes for VQE to determine the ground state energy of specific molecules. It is worth
noting that the approach taken by [14] tackles our same task of finding circuits to generate the ground
state of a Hamiltonian. However, the RL agent is specifically tailored for chemistry problems, and the
architecture may not be directly applicable to other domains. In contrast, our work introduces a novel
RL algorithm designed for more general optimization problems and difers in many crucial components:
architecture for the agent, state representations, rewards, and available actions.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Agent Design and Training</title>
      <p>In this study we propose a RL-based algorithm called Reinforcement Learning for Variational Quantum
Circuits (RLVQC), designed to learn how to build new quantum circuits aimed at finding the ground
state of the Hamiltonian associated with a given optimization problem. In this section we describe the
diferent components of RLVQC, including environment, actions, and reward. Then, we present the
experimental setup and specify some implementation details useful to reproduce the experiments.
(4)
(5)

( )
Time step  − 1</p>
      <sec id="sec-4-1">
        <title>4.1. Environment, Actions, Reward</title>
        <p>In RLVQC, the environment is represented by a parametric circuit with  qubits. At each step , the
agent performs an action consisting in adding a new gate to the circuit. The environment configuration
at the beginning of each episode is represented by a circuit with a single layer of Hadamard gates. The
action set  comprises the gates that the agent can insert into the circuit. Specifically,  is the union of
the following sets:
•  = {( ) |  ∈ {, , },  = 0, ...,  − 1}
•  = {( ) | ,  ∈ {, , }, ,  = 0, ...,  − 1,  &lt; }
where  is a single rotation gate applied to qubit  and  is a double rotation applied to a pair of
diferent qubits  and . Formally, denoting as   the Pauli- matrix for all ,  ∈ {, , }, the double
rotations are defined as:
( ) = −  2  ⊗  
(7)
A specific double rotation ( ) can be decomposed in simpler gates as follows:
( )
=
where  and  are gates which map the  and  axis into the  axis respectively. Thus,  = ,
 = ( 2 ), and  = . These double rotations are able to generate entanglement between qubits.</p>
        <p>The gates in the set  are chosen because they exhibit identity-like behavior when their parameters
are set to 0. This property is advantageous because starting from a circuit with optimized parameters
and adding a new gate with parameters set to 0 allows the optimization process to begin from a favorable
initial point rather than a random one. This approach potentially reduces computational time and
mitigates the risk of converging to a local minimum [62].</p>
        <p>At each time step , the agent chooses a gate as an action to be applied to the environment, which
is updated by adding that gate with parameter   = 0 to the circuit. Notice that all the gates are
independently parametrized. Subsequently, the circuit parameters are optimized using the classical
optimizer COBYLA1, which has been shown to be efective in noise-free scenarios without demanding
an excessive computational cost [63, 64]. COBYLA is run for a maximum of 1000 iterations with the
goal of minimizing the cost function given by the expectation of the problem Hamiltonian on the final
state of the circuit, defined in (1). Such expectation is estimated by simulating the circuit 1000 times2.
The choice of estimating the expectation instead of computing it exactly is made to reflect the use of
an actual quantum computer. Indeed, on real quantum devices, we do not have direct access to the
quantum state, but we can only estimate it by performing multiple measurements.</p>
        <p>After COBYLA converges, the circuit using the optimized parameters is executed an additional 1000
times to obtain the estimated probability distribution of the final state of the circuit, in the form of a
vector of 2 real-valued elements. This probability distribution represents the next state observed by
the agent. Notice that the design of an efective representation for the environment state is not trivial
and depends on several factors related to the specific task and the agent’s architecture. For this reason,
designing new state representations specifically tailored for quantum systems can be considered an
important and wide research area that goes beyond the scope of this paper.</p>
        <p>The reward at time step  is defined to align with our goal of minimizing the expectation value of the
Hamiltonian and the circuit depth while the agent learns to maximize the return (2). Specifically, the
reward is expressed as:
 = −⟨ ⟩* −  · ,
(8)
where ⟨⟩* is an estimate of the current expectation value as defined in (1) and  denotes the current
circuit depth, weighted by the  hyperparameter, which is fixed at 0.015 in our experiments. This
reward is straightforward and directly tied to the quality of the circuit built at time step : the first term
incentivizes the minimization of ⟨⟩* , which is our primary goal, while the second term drives the
agent to prefer shallower circuits, which are more resilient to noise and qubit decoherence. It is worth
noting that in this phase the circuit depth is computed after expressing the circuits in terms of the basis
gates {, , , , }. In particular, while the double rotation  is directly used, the others are
represented in terms of it. Finally, we remark that designing an efective reward function that combines
diferent goals is again a critical aspect in RL which allows for great flexibility. Moreover, it is important
to highlight that the suitability of a particular reward function may vary depending on the nature of the
task and specific requirements. A complete visual overview of the RL pipeline is represented in Fig. 3.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Training Details</title>
        <p>RLVQC uses the version of PPO implemented as described by OpenAI [54, 55], with default
hyperparameters. This implementation employs two multi-layer fully-connected neural networks, each sharing
the same structure but with separate learnable parameters, for the policy and value network. With the
problem size denoted as , the input layer of each neural network comprises 2 neurons, matching the
size of the vector representing the environment state (see Section 4.1). The output layer for the policy
network has size ||, which depends on the number of problem variables. Agent training comprises 64
epochs, with each epoch being a collection of 384 action steps3 after which the parameters of the PPO’s
neural networks are updated. Each epoch consists of multiple episodes, each starting from a circuit
with only Hadamard gates and ending when a termination condition is met. Specifically, an episode
1We use the SciPy implementation of COBYLA, keeping its default hyperparameters. The documentation is available here:
https://docs.scipy.org/doc/scipy/reference/optimize.minimize-cobyla.html
2Circuit simulation is performed using Qiskit QASM Simulator. The documentation is available here: https://qiskit.github.io/
qiskit-aer/stubs/qiskit_aer.QasmSimulator.html
3This number is obtained by having multiple processes performing 64 action steps each in our default implementation.
terminates either when a maximum of 2 ·  actions have been performed or earlier, if the reward does
not improve with new actions. The latter mechanism is controlled by the patience hyperparameter.
Patience is initially set to 3 and decreases every time a step ends with a worse reward than the highest
found during the current episode. Otherwise, it is increased, but only up to its initial value. Stopping
the episode earlier avoids spending time steps on actions that are unlikely to improve the reward.
Throughout training, we keep track of the circuit with the highest reward, which is then analyzed after
training is completed.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Problem Instances</title>
        <p>In our experiments, we evaluate RLVQC on the task of finding the solution of optimization problems
that can be formulated as Quadratic Unconstrained Binary Optimization (QUBO) problems [65]. The
general formulation of a QUBO problem is the following:</p>
        <p>min
∈{0,1}
 Q
where  = {1, . . . , } ∈ {0, 1} and Q denotes an upper triangular or symmetrical  ×  matrix.
QUBO problems can be expressed as equivalent Ising problems through a change of variable [66].
Leveraging this formulation, the objective becomes to identify the ground state of a Hamiltonian
operator, a task which can be accomplished using our algorithm RLVQC.</p>
        <p>We address three diverse optimization problem types: Maximum Cut, Maximum Clique, and Minimum
Vertex Cover, all formulated as QUBO problems [65, 67]. It is important to highlight that Minimum
Vertex Cover and Maximum Clique involve constraints. To incorporate these constraints into the
QUBO formulation, penalty terms are directly integrated into the cost functions. Each penalty term is a
function that yields a value of 1 when a constraint is violated and 0 when it is satisfied, multiplied by a
weight coeficient. These coeficients are chosen to ensure that all the feasible solutions have a lower
expectation than the infeasible ones, thereby guiding the algorithm towards favoring feasible solutions
over infeasible ones. For each problem type, we consider three diferent graph topologies:
• 3-regular, a graph where each vertex is connected to exactly three neighbors;
• 2D-grid, a two-dimensional integer lattice graph where each vertex can have up to 4 neighbours;
• Star graph, characterized by one central node connected to all other nodes, with no further
connections among the non-central nodes.</p>
        <p>For each problem type, we explore graphs with  = 8 and  = 14 vertices across each topology,
resulting in a total of 18 trained agents.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Evaluation Metrics</title>
        <p>Our main goal is to assess whether it is possible to use a RL agent to discover ansatzes capable of
sampling high-quality solutions to optimization problems. We aim to evaluate both the accuracy of
the solutions found and the characteristics of the resulting circuits. To this purpose, the most relevant
metric we use is the approximation ratio (A.R.), defined as:
(9)
(10)
A.R. = ⟨⟩* − ⟨ ⟩max ,</p>
        <p>⟨⟩min − ⟨ ⟩max
where ⟨⟩min and ⟨⟩max represent the minimum and maximum values of the achievable expectations,
respectively, and ⟨⟩* is an estimate of the expectation, as defined in (1). The intuition behind this
metric is that the ratio ⟨⟩* approaches 1 when ⟨⟩* closely approximates ⟨⟩min, indicating that the
⟨⟩min
circuit allows to sample with high probability states whose energy values are close to the minimum. To
guarantee that both the numerator and the denominator of the ratio have negative values, we subtract
⟨⟩max from both quantities. This ensures that the final approximation ratio falls between 0 and 1,
with closer proximity to 1 when the solution quality is higher. It is worth noting that in the case of the
Maximum Cut problem, ⟨⟩max = 0.</p>
        <p>However, the approximation ratio alone does not indicate if the constraints in Minimum Vertex
Cover and Maximum Clique are satisfied. Hence, in Table 1, we include a threshold corresponding
to the lowest approximation ratio of feasible solutions. Circuits with approximation ratios below this
threshold are more likely to sample infeasible solutions. Notice that, for the unconstrained Maximum
Cut problem, the feasibility threshold is 0.</p>
        <p>Finally, we analyze the composition of the final circuit, reporting the number of single and two-qubit
gates, as well as the circuit depth. These metrics are computed after expressing the circuit in terms of
the gates {, , , }, where  denotes the CNOT gate, and and applying some simplifications
using the Qiskit transpiler with optimization level 1, which is the default setting.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results and Discussion</title>
      <p>In this section, we discuss the results obtained by executing the RLVQC algorithm according to the
experimental protocol outlined in Section 4, using the metrics detailed in Section 4.4. To establish a
baseline for comparison, we also run QAOA with  = 1 on the same problem instances and analyze
the same set of metrics. Since QAOA is a stochastic algorithm which strictly depends on the initial
parameters, we perform 10 executions of the algorithm for each problem instance. We choose random
initial parameters for each run, and then average the approximation ratios obtained. This comparative
analysis is presented in Section 5.1.</p>
      <p>Notably, during training on the Maximum Cut problem, the agent discovered circuits with a common
structure and high approximation ratios. This observation led us to identify a novel family of ansatzes,
which we denote as -connected circuits, as discussed in Section 5.2. In particular, we conduct an
analysis of the most intuitive circuit within this family, which we refer to as Linear circuit, by examining
its performance across a broader set of graph topologies and diferent problem types. Finally, in Section
5.3, we discuss the straightforward implementability of the -connected circuits.</p>
      <sec id="sec-5-1">
        <title>5.1. RLVQC Results</title>
        <p>In Table 1 we report the results relative to the solutions and circuits found with RLVQC and QAOA with
 = 1 in terms of the metrics described in Section 4.4. Specifically, the table shows the approximation
ratio (A.R.) of the solutions, the approximation ratio threshold (A.R. Thresh.) for feasibility, the number
of single and two-qubit gates, and the circuit depth.</p>
        <p>By examining Table 1, it can be noticed that the approximation ratios found by RLVQC is significantly
influenced by the type of optimization problem being addressed. RLVQC consistently outperforms
QAOA in Maximum Cut and Minimum Vertex Cover instances, with one exception observed in the
case of the star-topology graph instances with  = 14 vertices for both problem types, where RLVQC
performs worse. Moreover, for the Minimum Vertex Cover instance with star-topology and  = 14,
RLVQC falls below the threshold for feasibility in terms of approximation ratio. The particularly strong
performance of RLVQC on Maximum Cut instances, especially on instances with  = 8 qubits where
RLVQC consistently achieves an approximation ratio of 0.99, prompted us to conduct a deeper analysis of
these circuits. Interestingly, we observed that the agent exclusively added  rotations to these circuits
during their construction, without performing any single-qubit actions. This observation suggests that
circuits containing  rotations may be particularly efective for addressing Maximum Cut problems.
In contrast, for Maximum Clique instances, RLVQC consistently yields lower approximation ratios
compared to QAOA, except for the instance built from a 2D-grid-topology graph with  = 8 vertices.
However, neither algorithm attains satisfactory results, as the threshold for feasibility is never met for
any graph topology.</p>
        <p>Upon analyzing the count of single and two-qubit gates, as well as the circuit depth, we observe
that RLVQC generally employs a number of gates comparable to QAOA but exhibits a higher depth.
Notably, there are cases where the RLVQC circuit has a gate count that is significantly lower than in</p>
        <p>A.R.</p>
        <p>RLVQC</p>
        <p>A.R. Thresh.</p>
        <p>Single-qubit
RLVQC QAOA</p>
        <p>Two-qubit
RLVQC QAOA</p>
        <p>Depth
RLVQC QAOA
8
14</p>
        <p>Maximum
Cut
Maximum
Clique
Minimum
Vertex
Cover
Maximum
Cut
Maximum
Clique
Minimum
Vertex
Cover
3-regular
2D-grid
Star graph
3-regular
2D-grid
Star graph
3-regular
2D-grid
Star graph
3-regular
2D-grid
Star graph
3-regular
2D-grid
Star graph
3-regular
2D-grid
Star graph</p>
        <p>QAOA, yet the depth increases. An example is given by Maximum Clique with  = 14, where the
count of two-qubit gates amounts to less than 25% of the two-qubit gates used by QAOA. Finally, it is
important to notice that, while circuits produced by RLVQC tend to have higher depth compared to
QAOA, adjusting the  hyperparameter in the reward function (8) allows for further penalizing deep
circuits. This adjustment may guide RLVQC toward the preference of low-depth circuits.</p>
        <p>In summary, RLVQC successfully found solutions with approximation ratios comparable to QAOA,
or superior in the case of the Maximum Cut problem. It is noteworthy that these results are obtained
using a relatively straightforward design for the RL agent, environment and reward. Therefore, there is
a considerable potential to further improve these results by better understanding how to design these
components for quantum computing tasks.
5.2. -connected Ansatzes
Here we describe the family of -connected ansatzes discovered during training on the Maximum
Cut problem. This family is composed of circuits with an initial layer of Hadamard gates, followed by
 − 1  rotations, which are applied such that each new  adds only one qubit to the “chain” of
connected qubits. More formally, calling  the graph representing the connectivity of the circuit after
applying the first   rotations, the  graphs associated to each -connected ansatz are connected
for all  = 1, . . . ,  − 1.</p>
        <p>Recall that a  gate is equivalent to two  gates with a  rotation between them on the target
qubit, as shown below:
=
( )
Using the more general definition of double rotations provided in (7) and the above expansion of the
 gate, we have that  rotations can be decomposed as follows:
( )
=
( 2 )</p>
        <p>A notable property of -connected circuits is that two basis states that have all bits flipped w.r.t.
to one another have the same probability of being measured. This characteristic makes the circuits
especially well-suited for problems that possess this symmetry property, such as the Maximum Cut
problem.</p>
        <p>Among the elements of the -connected family, we focus on a specific circuit, which we denote as
Linear circuit, where each  gate connects one qubit to the next one (see Fig. 4). It is worth noting
that each block of the figure represents the decomposition of a  gate as explained in Section 5.2.
The first ( 2 ) gate is omitted since it behaves as the identity when applied after a Hadamard gate.
3 :
2 :
1 :
0 :</p>
        <p>H
H
H
H
( 23)
(− 2 )
( 12)
(− 2 )</p>
        <p>The choice of analyzing this particular -connected ansatz is motivated by its uniform application
of  gates across all qubits, making it the most straightforward configuration within the family.</p>
        <p>The primary objective of our analysis is to evaluate the quality of solutions obtained by the Linear
circuit on the Maximum Cut problem, but across a broader range of underlying graph topologies. We
maintain the use of 3-regular, 2D-grid, and star graphs, and further include cycle graphs, where each
vertex is connected to exactly two other vertices, forming a closed loop. Additionally, we introduce
Erdős–Rényi graphs generated from a set of vertices, with edges independently connected based on fixed
probabilities of 0.2, 0.5, and 0.8. Subsequently, we evaluate the performance of the Linear circuit on other
problem types, specifically Maximum Clique and Minimum Vertex Cover, using the same underlying
topologies. The algorithms are tested on instances with 8, 14, and 16 vertices, which represents the
largest tractable size with our available hardware. However, we present results only for the 16-vertex
instances, as the findings for smaller instances are analogous. We compare the performance of the
Linear circuit with state-of-the-art quantum algorithms, including QAOA with depths  = 1 and  = 2,
QAOA+, and ma-QAOA (see Section 2). Each algorithm is executed 10 times on each problem instance
and the results are averaged. We choose random initial parameters for each execution.</p>
        <p>Fig. 5 represents the average approximation ratios obtained by executing the algorithms on the
Maximum Cut problem across diverse graph topologies, as described above. A notable observation is
that the Linear circuit often achieves the best approximation ratio. However, there are exceptions, such
as the Erdős–Rényi random graph with an edge probability of 0.2, where the Linear circuit performs
slightly worse than QAOA with  = 2. Moreover, on the star-topology graph, the approximation ratio
obtained by the Linear circuit is significantly lower than that obtained by all other algorithms. We
believe that further investigation into the cause of this behavior can be useful.</p>
        <p>In contrast, when applying the same algorithms to Maximum Clique and Minimum Vertex Cover
problems, we observe that the approximation ratios of the Linear circuit are lower than those achieved
by the other algorithms. Among the algorithms tested, QAOA with  = 2 consistently achieves the
highest approximation ratio across all graph topologies, except for the Erdős–Rényi graphs with edge
probabilities of 0.5 and 0.8, where QAOA with  = 1 and ma-QAOA perform the best, respectively.
These findings support our observation that -connected ansatzes are suitable for problems where
the cost of a solution remains the same if all bits are flipped, which include Maximum Cut, but not
Maximum Clique and Minimum Vertex Cover problems.</p>
        <p>Overall, we can conclude that RLVQC was able to find a reasonable ansatz, consistently capable of
ifnding higher approximation ratios than other quantum state-of-the-art algorithms on the Maximum
Cut problem across various underlining graph topologies.
5.3. Implementability of -connected Ansatzes
In this subsection we discuss the ease of implementability of the -connected ansatzes on specific real
quantum devices. In modern technologies,  rotations can be implemented with negligible error and
time for any rotation angle [68], while implementing a  rotation requires a specific application time.
Since any arbitrary rotation can be achieved by combining  and ( 2 ) rotations, superconducting
computers are often calibrated to perform only  and ( 2 ) rotations, and not the full set of 
rotations. Our -connected ansatzes are very good in such a framework, since we can observe that
the  rotations can be decomposed into  and (− 2 ) rotations, and  gates. In particular, the
only single-qubit parametric gates are the  rotations, which can be executed virtually without error.
Additionally, to implement the (− 2 ) rotations, we can substitute each of these gates with a ( 2 ).
-connected ansatzes maintain their performance after this substitution, allowing us to exploit the
calibration of the hardware.</p>
        <p>Moreover, the -connected family ofers advantages when mapping the logic circuit to the quantum
device topology. Specifically, we can select a particular element of the -connected ansatzes based on
the connectivity of the available hardware. This approach reduces the number of SWAP gates required
to connect qubits that are connected in the logical circuit but not in the hardware. This approach results
in resource savings and reduces errors caused by an increased number of gates and circuit depth.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions and Future Directions</title>
      <p>In this study, we introduced RLVQC, a Reinforcement Learning algorithm designed to learn how to
construct quantum circuits to solve optimization problems through a quantum variational approach.
Overall, RLVQC demonstrated its ability to build circuits able to generate solutions with approximation
ratios comparable to those obtained by QAOA, and particularly high in the case of the Maximum Cut
problem.</p>
      <p>Furthermore, we analyzed the -connected ansatzes, a specific family of circuits discovered by
the agent during its training on the Maximum Cut problem. Our investigation revealed that the
-connected ansatzes can achieve high approximation ratios, superior to those obtained with other
state-of-the-art algorithms on Maximum Cut instances.</p>
      <p>These results suggest that approaches employing Reinforcement Learning agents to construct
variational circuits hold promise. Given the flexibility of RL, there is potential for future research aimed at
improving each of the components of RLVQC, including the representation of the environment’s state,
the agent’s network architecture, the action set, and the reward function. Each of these elements may
be designed, for example, to take advantage of the peculiarities of a given problem, a specific hardware
or even adapt to the requirements set by the task at hand. More generally, RL looks promising for
researchers aiming to address challenges characterized by large solution spaces within the quantum
computing domain.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>We acknowledge the financial support from ICSC - “National Research Centre in High Performance
Computing, Big Data and Quantum Computing”, funded by European Union – NextGenerationEU. We
also acknowledge the support and computational resources provided by E4 Computer Engineering
S.p.A.
[9] M. M. Wauters, E. Panizon, G. B. Mbeng, G. E. Santoro, Reinforcement-learning-assisted
quantum optimization, Phys. Rev. Res. 2 (2020) 033446. URL: https://link.aps.org/doi/10.1103/
PhysRevResearch.2.033446. doi:10.1103/PhysRevResearch.2.033446.
[10] S. Khairy, R. Shaydulin, L. Cincio, Y. Alexeev, P. Balaprakash, Learning to optimize variational
quantum circuits to solve combinatorial problems, Proceedings of the AAAI Conference on
Artificial Intelligence 34 (2020) 2367–2375. URL: http://dx.doi.org/10.1609/aaai.v34i03.5616. doi: 10.
1609/aaai.v34i03.5616.
[11] J. Yao, M. Bukov, L. Lin, Policy gradient based quantum approximate optimization algorithm, 2020.</p>
      <p>arXiv:2002.01068.
[12] R. Nembrini, M. F. Dacrema, P. Cremonesi, An application of reinforcement learning for minor
embedding in quantum annealing, in: Proceedings of the International Workshop on AI for
Quantum and Quantum for AI (AIQxQIA 2024) co-located with the 23rd International Conference
of the Italian Association for Artificial Intelligence (AIxIA 2024), CEUR Workshop Proceedings,
CEUR-WS.org, 2024.
[13] M. Pirhooshyaran, T. Terlaky, Quantum circuit design search, Quantum Machine
Intelligence 3 (2021) 25. URL: https://doi.org/10.1007/s42484-021-00051-z. doi:10.1007/
s42484-021-00051-z.
[14] M. Ostaszewski, L. M. Trenkwalder, W. Masarczyk, E. Scerri, V. Dunjko, Reinforcement learning
for optimization of variational quantum circuit architectures, in: M. Ranzato, A. Beygelzimer,
Y. Dauphin, P. Liang, J. W. Vaughan (Eds.), Advances in Neural Information Processing Systems,
volume 34, Curran Associates, Inc., 2021, pp. 18182–18194. URL: https://proceedings.neurips.cc/
paper_files/paper/2021/file/9724412729185d53a2e3e7f889d9f057-Paper.pdf.
[15] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, J. L.</p>
      <p>O’Brien, A variational eigenvalue solver on a photonic quantum processor, Nature
Communications 5 (2014). URL: http://dx.doi.org/10.1038/ncomms5213. doi:10.1038/ncomms5213.
[16] J. Tilly, H. Chen, S. Cao, D. Picozzi, K. Setia, Y. Li, E. Grant, L. Wossnig, I. Rungger, G. H. Booth, J.
Tennyson, The variational quantum eigensolver: A review of methods and best practices, Physics
Reports 986 (2022) 1–128. URL: https://www.sciencedirect.com/science/article/pii/S0370157322003118.
doi:https://doi.org/10.1016/j.physrep.2022.08.003, the Variational Quantum
Eigensolver: a review of methods and best practices.
[17] Y. Cao, J. Romero, J. P. Olson, M. Degroote, P. D. Johnson, M. Kieferová, I. D. Kivlichan,
T. Menke, B. Peropadre, N. P. D. Sawaya, S. Sim, L. Veis, A. Aspuru-Guzik,
Quantum chemistry in the age of quantum computing, Chemical Reviews 119 (2019) 10856–
10915. URL: https://doi.org/10.1021/acs.chemrev.8b00803. doi:10.1021/acs.chemrev.8b00803.
arXiv:https://doi.org/10.1021/acs.chemrev.8b00803, pMID: 31469277.
[18] J. R. McClean, J. Romero, R. Babbush, A. Aspuru-Guzik, The theory of variational hybrid
quantumclassical algorithms, New Journal of Physics 18 (2016) 023023. URL: https://dx.doi.org/10.1088/
1367-2630/18/2/023023. doi:10.1088/1367-2630/18/2/023023.
[19] E. Farhi, J. Goldstone, S. Gutmann, A quantum approximate optimization algorithm, 2014.</p>
      <p>arXiv:1411.4028.
[20] K. Blekos, D. Brand, A. Ceschini, C.-H. Chou, R.-H. Li, K. Pandya, A. Summer, A review on quantum
approximate optimization algorithm and its variants, 2023. arXiv:2306.09198.
[21] G. E. Crooks, Performance of the quantum approximate optimization algorithm on the maximum
cut problem, 2018. arXiv:1811.08419.
[22] M. Willsch, D. Willsch, F. Jin, H. D. Raedt, K. Michielsen, Benchmarking the quantum approximate
optimization algorithm, Quantum Information Processing 19 (2020). URL: https://doi.org/10.1007%
2Fs11128-020-02692-8. doi:10.1007/s11128-020-02692-8.
[23] J. Cook, S. Eidenbenz, A. Bärtschi, The quantum alternating operator ansatz on maximum k-vertex
cover, in: 2020 IEEE International Conference on Quantum Computing and Engineering (QCE),
2020, pp. 83–92. doi:10.1109/QCE49297.2020.00021.
[24] C. Y.-Y. Lin, Y. Zhu, Performance of qaoa on typical instances of constraint satisfaction problems
with bounded degree, 2016. arXiv:1601.01744.
[25] M. Radzihovsky, J. Murphy, M. Swoford, A qaoa solution to the traveling salesman problem using
pyquil, 2019.
[26] S. Brandhofer, D. Braun, V. Dehn, G. Hellstern, M. Hüls, Y. Ji, I. Polian, A. S. Bhatia, T. Wellens,
Benchmarking the performance of portfolio optimization with qaoa, Quantum
Information Processing 22 (2022) 25. URL: https://doi.org/10.1007/s11128-022-03766-5. doi:10.1007/
s11128-022-03766-5.
[27] K. Kurowski, T. Pecyna, M. Slysz, R. Różycki, G. Waligóra, J. Weglarz, Application of quantum
approximate optimization algorithm to job shop scheduling problem, European Journal of
Operational Research 310 (2023) 518–528. URL: https://www.sciencedirect.com/science/article/pii/
S0377221723002072. doi:https://doi.org/10.1016/j.ejor.2023.03.013.
[28] R. Herrman, P. C. Lotshaw, J. Ostrowski, T. S. Humble, G. Siopsis, Multi-angle quantum
approximate optimization algorithm, Scientific Reports 12 (2022) 6781. URL: https://doi.org/10.1038/
s41598-022-10555-8. doi:10.1038/s41598-022-10555-8.
[29] M. Chalupnik, H. Melo, Y. Alexeev, A. Galda, Augmenting qaoa ansatz with
multiparameter problem-independent layer, in: 2022 IEEE International Conference on Quantum
Computing and Engineering (QCE), IEEE Computer Society, Los Alamitos, CA, USA, 2022, pp.
97–103. URL: https://doi.ieeecomputersociety.org/10.1109/QCE53715.2022.00028. doi:10.1109/
QCE53715.2022.00028.
[30] S. Sim, P. D. Johnson, A. Aspuru-Guzik, Expressibility and entangling capability
of parameterized quantum circuits for hybrid quantum-classical algorithms,
Advanced Quantum Technologies 2 (2019) 1900070. URL: https://onlinelibrary.wiley.com/
doi/abs/10.1002/qute.201900070. doi:https://doi.org/10.1002/qute.201900070.
arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/qute.201900070.
[31] J. Qin, Review of ansatz designing techniques for variational quantum algorithms, Journal of
Physics: Conference Series 2634 (2023) 012043. URL: https://dx.doi.org/10.1088/1742-6596/2634/1/
012043. doi:10.1088/1742-6596/2634/1/012043.
[32] J. Wurtz, P. J. Love, Classically optimal variational quantum algorithms, IEEE Transactions on</p>
      <p>Quantum Engineering 2 (2021) 1–7. doi:10.1109/TQE.2021.3122568.
[33] Y. Du, M.-H. Hsieh, T. Liu, D. Tao, Expressive power of parametrized quantum circuits, Phys.</p>
      <p>Rev. Res. 2 (2020) 033125. URL: https://link.aps.org/doi/10.1103/PhysRevResearch.2.033125. doi:10.
1103/PhysRevResearch.2.033125.
[34] F. Brozzi, G. Turati, M. F. Dacrema, Exploring the role of hamiltonian expressibility in ansatz
selection for variational quantum algorithms, in: Proceedings of the International Workshop
on AI for Quantum and Quantum for AI (AIQxQIA 2024) co-located with the 23rd International
Conference of the Italian Association for Artificial Intelligence (AIxIA 2024), CEUR Workshop
Proceedings, CEUR-WS.org, 2024.
[35] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Babbush, H. Neven, Barren plateaus in quantum
neural network training landscapes, Nature Communications 9 (2018). URL: http://dx.doi.org/10.
1038/s41467-018-07090-4. doi:10.1038/s41467-018-07090-4.
[36] A. Arrasmith, M. Cerezo, P. Czarnik, L. Cincio, P. J. Coles, Efect of barren plateaus on
gradientfree optimization, Quantum 5 (2021) 558. URL: https://doi.org/10.22331/q-2021-10-05-558. doi:10.
22331/q-2021-10-05-558.
[37] A. Arrasmith, Z. Holmes, M. Cerezo, P. J. Coles, Equivalence of quantum barren plateaus to
cost concentration and narrow gorges, Quantum Science and Technology 7 (2022) 045015. URL:
https://dx.doi.org/10.1088/2058-9565/ac7d06. doi:10.1088/2058-9565/ac7d06.
[38] M. Cerezo, P. J. Coles, Higher order derivatives of quantum neural networks with barren plateaus,
Quantum Science and Technology 6 (2021) 035006. URL: https://dx.doi.org/10.1088/2058-9565/
abf51a. doi:10.1088/2058-9565/abf51a.
[39] Z. Holmes, K. Sharma, M. Cerezo, P. J. Coles, Connecting ansatz expressibility to gradient
magnitudes and barren plateaus, PRX Quantum 3 (2022) 010313. URL: https://link.aps.org/doi/10.
1103/PRXQuantum.3.010313. doi:10.1103/PRXQuantum.3.010313.
[40] M. Larocca, P. Czarnik, K. Sharma, G. Muraleedharan, P. J. Coles, M. Cerezo, Diagnosing Barren
Plateaus with Tools from Quantum Optimal Control, Quantum 6 (2022) 824. URL: https://doi.org/
10.22331/q-2022-09-29-824. doi:10.22331/q-2022-09-29-824.
[41] T. Volkof, P. J. Coles, Large gradients via correlation in random parameterized quantum circuits,
Quantum Science and Technology 6 (2021) 025008. URL: https://dx.doi.org/10.1088/2058-9565/
abd891. doi:10.1088/2058-9565/abd891.
[42] H. R. Grimsley, S. E. Economou, E. Barnes, N. J. Mayhall, An adaptive variational algorithm for
exact molecular simulations on a quantum computer, Nature Communications 10 (2019) 3007.</p>
      <p>URL: https://doi.org/10.1038/s41467-019-10988-2. doi:10.1038/s41467-019-10988-2.
[43] H. L. Tang, V. Shkolnikov, G. S. Barron, H. R. Grimsley, N. J. Mayhall, E. Barnes, S. E. Economou,
Qubit-ADAPT-VQE: An adaptive algorithm for constructing hardware-eficient ansätze on a
quantum processor, PRX Quantum 2 (2021). URL: https://doi.org/10.1103%2Fprxquantum.2.020310.
doi:10.1103/prxquantum.2.020310.
[44] Y. S. Yordanov, V. Armaos, C. H. W. Barnes, D. R. M. Arvidsson-Shukur, Qubit-excitation-based
adaptive variational quantum eigensolver, Communications Physics 4 (2021). URL: https://doi.org/
10.1038%2Fs42005-021-00730-0. doi:10.1038/s42005-021-00730-0.
[45] C. Feniou, M. Hassan, D. Traoré, E. Giner, Y. Maday, J.-P. Piquemal, Overlap-adapt-vqe: practical
quantum chemistry on quantum computers via overlap-guided compact ansätze,
Communications Physics 6 (2023) 192. URL: https://doi.org/10.1038/s42005-023-01312-y. doi:10.1038/
s42005-023-01312-y.
[46] L. Zhu, H. L. Tang, G. S. Barron, F. A. Calderon-Vargas, N. J. Mayhall, E. Barnes, S. E. Economou,
Adaptive quantum approximate optimization algorithm for solving combinatorial problems on
a quantum computer, Phys. Rev. Res. 4 (2022) 033029. URL: https://link.aps.org/doi/10.1103/
PhysRevResearch.4.033029. doi:10.1103/PhysRevResearch.4.033029.
[47] A. G. Rattew, S. Hu, M. Pistoia, R. Chen, S. Wood, A domain-agnostic, noise-resistant,
hardwareeficient evolutionary variational quantum eigensolver, 2020. arXiv:1910.09694.
[48] D. Chivilikhin, A. Samarin, V. Ulyantsev, I. Iorsh, A. R. Oganov, O. Kyriienko, Mog-vqe:
Multiobjective genetic variational quantum eigensolver, 2020. arXiv:2007.04424.
[49] U. Las Heras, U. Alvarez-Rodriguez, E. Solano, M. Sanz, Genetic algorithms for digital quantum
simulations, Phys. Rev. Lett. 116 (2016) 230504. URL: https://link.aps.org/doi/10.1103/PhysRevLett.
116.230504. doi:10.1103/PhysRevLett.116.230504.
[50] L. Cincio, Y. Subaşı, A. T. Sornborger, P. J. Coles, Learning the quantum algorithm for state
overlap, New Journal of Physics 20 (2018) 113022. URL: https://dx.doi.org/10.1088/1367-2630/aae94a.
doi:10.1088/1367-2630/aae94a.
[51] Y. Du, T. Huang, S. You, M.-H. Hsieh, D. Tao, Quantum circuit architecture search for
variational quantum algorithms, npj Quantum Information 8 (2022) 62. URL: https://doi.org/10.1038/
s41534-022-00570-y. doi:10.1038/s41534-022-00570-y.
[52] M. Bilkis, M. Cerezo, G. Verdon, P. J. Coles, L. Cincio, A semi-agnostic ansatz with variable
structure for variational quantum algorithms, Quantum Machine Intelligence 5 (2023) 43. URL:
https://doi.org/10.1007/s42484-023-00132-1. doi:10.1007/s42484-023-00132-1.
[53] R. S. Sutton, A. G. Barto, Reinforcement learning - an introduction, Adaptive computation and
machine learning, MIT Press, 1998. URL: https://www.worldcat.org/oclc/37293240.
[54] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal policy optimization algorithms,</p>
      <p>CoRR abs/1707.06347 (2017). URL: http://arxiv.org/abs/1707.06347. arXiv:1707.06347.
[55] J. Achiam, Spinning Up in Deep Reinforcement Learning, 2018.
[56] S. Giordano, M. A. Martin-Delgado, Reinforcement-learning generation of four-qubit entangled
states, Phys. Rev. Res. 4 (2022) 043056. URL: https://link.aps.org/doi/10.1103/PhysRevResearch.4.
043056. doi:10.1103/PhysRevResearch.4.043056.
[57] E.-J. Kuo, Y.-L. L. Fang, S. Y.-C. Chen, Quantum architecture search via deep reinforcement learning,
2021. arXiv:2104.07715.
[58] X. Zhu, X. Hou, Quantum architecture search via truly proximal policy optimization,
Scientific Reports 13 (2023) 5157. URL: https://doi.org/10.1038/s41598-023-32349-2. doi: 10.1038/
s41598-023-32349-2.
[59] L. Moro, M. G. A. Paris, M. Restelli, E. Prati, Quantum compiling by deep reinforcement learning,
Communications Physics 4 (2021) 178. URL: https://doi.org/10.1038/s42005-021-00684-3. doi:10.
1038/s42005-021-00684-3.
[60] Y.-H. Zhang, P.-L. Zheng, Y. Zhang, D.-L. Deng, Topological quantum compiling with reinforcement
learning, Phys. Rev. Lett. 125 (2020) 170501. URL: https://link.aps.org/doi/10.1103/PhysRevLett.
125.170501. doi:10.1103/PhysRevLett.125.170501.
[61] T. Fösel, M. Y. Niu, F. Marquardt, L. Li, Quantum circuit optimization with deep reinforcement
learning, 2021. URL: https://arxiv.org/abs/2103.07585. arXiv:2103.07585.
[62] E. Grant, L. Wossnig, M. Ostaszewski, M. Benedetti, An initialization strategy for addressing
barren plateaus in parametrized quantum circuits, Quantum 3 (2019) 214. URL: http://dx.doi.org/
10.22331/q-2019-12-09-214. doi:10.22331/q-2019-12-09-214.
[63] H. Singh, S. Majumder, S. Mishra, Benchmarking of diferent optimizers in the variational quantum
algorithms for applications in quantum chemistry, The Journal of Chemical Physics 159 (2023)
044117. URL: https://doi.org/10.1063/5.0161057. doi:10.1063/5.0161057.
[64] M. Fernández-Pendás, E. F. Combarro, S. Vallecorsa, J. Ranilla, I. F. Rúa, A study of the performance
of classical minimizers in the quantum approximate optimization algorithm, Journal of
Computational and Applied Mathematics 404 (2022) 113388. URL: https://www.sciencedirect.com/science/
article/pii/S0377042721000078. doi:https://doi.org/10.1016/j.cam.2021.113388.
[65] F. Glover, G. Kochenberger, R. Hennig, Y. Du, Quantum bridge analytics i: A tutorial on formulating
and using qubo models, Annals of Operations Research 314 (2022) 141–183. URL: https://doi.org/
10.1007/s10479-022-04634-2. doi:10.1007/s10479-022-04634-2.
[66] A. Lucas, Ising formulations of many np problems, Frontiers in Physics 2 (2014). URL: https:
//www.frontiersin.org/articles/10.3389/fphy.2014.00005. doi:10.3389/fphy.2014.00005.
[67] E. Pelofske, G. Hahn, H. Djidjev, Solving large maximum clique problems on a quantum annealer,
in: S. Feld, C. Linnhof-Popien (Eds.), Quantum Technology and Optimization Problems, Springer
International Publishing, Cham, 2019, pp. 123–135.
[68] D. C. McKay, C. J. Wood, S. Sheldon, J. M. Chow, J. M. Gambetta, Eficient z-gates for quantum
computing, Physical Review A 96 (2017). URL: http://dx.doi.org/10.1103/PhysRevA.96.022330.
doi:10.1103/physreva.96.022330.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cerezo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Arrasmith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Babbush</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. C.</given-names>
            <surname>Benjamin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Endo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Fujii</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>McClean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Mitarai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cincio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Coles</surname>
          </string-name>
          ,
          <article-title>Variational quantum algorithms</article-title>
          ,
          <source>Nature Reviews Physics</source>
          <volume>3</volume>
          (
          <year>2021</year>
          )
          <fpage>625</fpage>
          -
          <lpage>644</lpage>
          . URL: http://dx.doi.org/10.1038/s42254-021-00348-9. doi:
          <volume>10</volume>
          .1038/ s42254-021-00348-9.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J.</given-names>
            <surname>Preskill</surname>
          </string-name>
          ,
          <article-title>Quantum computing in the nisq era and beyond</article-title>
          ,
          <source>Quantum</source>
          <volume>2</volume>
          (
          <year>2018</year>
          )
          <article-title>79</article-title>
          . URL: http: //dx.doi.org/10.22331/q-2018
          <source>-08-06-79</source>
          . doi:
          <volume>10</volume>
          .22331/q-2018
          <source>-08-06-79.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Meyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mularski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Gil-Fuster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Mele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Arzani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wilms</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Eisert</surname>
          </string-name>
          ,
          <article-title>Exploiting symmetry in variational quantum machine learning</article-title>
          ,
          <source>PRX Quantum 4</source>
          (
          <year>2023</year>
          )
          <article-title>010328</article-title>
          . URL: https: //link.aps.org/doi/10.1103/PRXQuantum.4.010328. doi:
          <volume>10</volume>
          .1103/PRXQuantum.4.010328.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>I. N. M.</given-names>
            <surname>Le</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schuhmacher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Tavernelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tacchino</surname>
          </string-name>
          ,
          <article-title>Symmetry-invariant quantum machine learning force fields</article-title>
          ,
          <year>2023</year>
          . arXiv:
          <volume>2311</volume>
          .
          <fpage>11362</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Wierichs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. D. P.</given-names>
            <surname>East</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Larocca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cerezo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Killoran</surname>
          </string-name>
          ,
          <article-title>Symmetric derivatives of parametrized quantum circuits</article-title>
          ,
          <year>2023</year>
          . URL: https://arxiv.org/abs/2312.06752. arXiv:
          <volume>2312</volume>
          .
          <fpage>06752</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Turati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Ferrari</given-names>
            <surname>Dacrema</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Cremonesi</surname>
          </string-name>
          ,
          <article-title>Benchmarking adaptative variational quantum algorithms on qubo instances</article-title>
          ,
          <source>in: 2023 IEEE International Conference on Quantum Computing and Engineering (QCE)</source>
          , volume
          <volume>01</volume>
          ,
          <year>2023</year>
          , pp.
          <fpage>407</fpage>
          -
          <lpage>413</lpage>
          . doi:
          <volume>10</volume>
          .1109/QCE57702.
          <year>2023</year>
          .
          <volume>00053</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Claudino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wright</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J.</given-names>
            <surname>McCaskey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Humble</surname>
          </string-name>
          ,
          <article-title>Benchmarking adaptive variational quantum eigensolvers, Frontiers in Chemistry 8 (</article-title>
          <year>2020</year>
          ). URL: https://www.frontiersin.org/articles/10.3389/ fchem.
          <year>2020</year>
          .
          <volume>606863</volume>
          . doi:
          <volume>10</volume>
          .3389/fchem.
          <year>2020</year>
          .
          <volume>606863</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mukherjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. F.</given-names>
            <surname>Berthusen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Getelina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. P.</given-names>
            <surname>Orth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-X.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <article-title>Comparative study of adaptive variational quantum eigensolvers for multi-orbital impurity models</article-title>
          ,
          <source>Communications Physics 6</source>
          (
          <year>2023</year>
          )
          <article-title>4</article-title>
          . URL: https://doi.org/10.1038/s42005-022-01089-6. doi:
          <volume>10</volume>
          .1038/s42005-022-01089-6.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>