=Paper= {{Paper |id=Vol-3008/paper3 |storemode=property |title=Towards a Quantum Algorithm for Evaluating WCETs |pdfUrl=https://ceur-ws.org/Vol-3008/paper3.pdf |volume=Vol-3008 |authors=Gabriella Bettonte,Stéphane Louise,Renaud Sirdey |dblpUrl=https://dblp.org/rec/conf/qce/BettonteLS21 }} ==Towards a Quantum Algorithm for Evaluating WCETs== https://ceur-ws.org/Vol-3008/paper3.pdf
Towards a quantum algorithm for evaluating WCETs
Gabriella Bettontea , Stéphane Louisea and Renaud Sirdeya
a
    Université Paris-Saclay, CEA List


                                         Abstract
                                         In this paper we propose a quantum-based solution to the problem of counting the cache hits, an impor-
                                         tant issue when analyzing real-time embedded applications. This field has already seen the development
                                         of “quantum-inspired” classical algorithms which are competitive with the state of the art. We designed
                                         a polynomial-time dynamic programming algorithm for computing the lowest number of cache hits
                                         realized by a deterministic sequence of memory accesses, in the presence of preemptions. Our contri-
                                         bution consists in porting that algorithm to the quantum framework, improving the complexity of the
                                         algorithm from 𝑂(𝑁 3 ) to 𝑂(𝑁 2 + 𝑁 ).

                                         Keywords
                                         WCETs, preemption, quantum computing, dynamic programming.




1. Introduction
For the most part, the interest addressed to quantum computing comes from the ability of
qubits to store a superposition state that reflects all the possible inputs/outputs of a given
algorithm, until a measurement is done. This property is called quantum parallelism and can, in
certain cases, give a massive performance boost for quantum algorithms. But the advantages of
quantum computing do not come without caveats: only some classes of problems can be solved
by quantum computing, with a definite gain in terms of efficiency with respect to classical
computing. Indeed, one important research issue related to quantum computing is defining
with precision such problems. Yet, within the variety of quantum algorithms we can identify
two main design blueprints. The first one is defined by quantum algorithms capable to reach
an exponential speedup over classical algorithms on precisely defined and heavily structured
mathematical problems, like for instance Shor’s algorithm for integer factorization. The second
one is defined by quantum algorithms with a quadratic speedup over classical ones, like for
instance Grover’s algorithm for searching an unstructured array. Still, a lot of questions remain
open about the real world applications and benefits of quantum computing, especially for more
arbitrary problems and fields without any a priori “quantum-friendlyness”.
   In this paper, we consider the problem of worst-case execution time (WCET) evaluation by
static analysis of programs. The WCET analysis is highly relevant to the design of safety-
critical real-time systems, which needs to respect all the timing constraints they are specified
to meet for the verification of their safety properties. Also, WCET analysis has recently seen the
developments of “quantum-inspired” classical algorithms which are competitive (in terms of

2nd Quantum Software Engineering and Technology Workshop | IEEE Quantum Week 2021

                                       © 2021 Copyright for this paper by its authors.
                                       Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                       CEUR Workshop Proceedings (CEUR-WS.org)
    CEUR
                  http://ceur-ws.org
    Workshop      ISSN 1613-0073
    Proceedings




                                                                                                         66
precision and efficiency) with state-of-the-art approaches. Furthermore, the problems arising in
WCET evaluation cover a wide-range of complexity classes, from undecidability in the general
case down to 𝑁 𝑃-hardness and polynomial-time solvability in some restricted cases [17]. As
such, it appears to provide a relevant playground to put the quantum computing promise to
the test, although other fields may be as relevant and should be explored as well.
   In this direction, in this paper, we tackle only a restricted setup with the most simple pro-
gram model: the evaluation of the worst-case number of cache misses of programs performing
deterministic sequences of memory accesses, in case of arbitrary preemptions. A preemption
is the act of interrupting one task to allow the execution of another task on a machine. There is
indeed a strong connection between the number of cache misses done by program and its exe-
cution time, as uncached memory accesses are highly time-consuming on modern processors.
In other words, we consider a single path program (linear sequences of instructions) and the
complexity of the model comes from the arbitrary-placed interruptions, due to other programs
running on the same system, which create interferences in the cache memory. For our model,
we consider 𝐾 preemption points and they can occur at any time in the sequence, making the
cache behaviour non-predictable.
   In this paper, we propose a dynamic programming classical algorithm, with polynomial-
time complexity, to compute the minimal number of cache hits in the sequence of memory
accesses, as a proxy measure of the WCET for the considered program. Although our model
is standard the WCET community, we consider it, as well as the classical polynomial-time
algorithm solving it, as the basis to derive a lower complexity (still polynomial-time) hybrid
quantum-classical algorithm. In doing so, we demonstrate a first benefit of explicitly using the
quantum computing paradigm to the field of WCET calculations, albeit in simplified setting.
   Dynamic programming is an algorithmic technique to solve a problem consisting in find-
ing one optimal solution by solving a family of easier sub-problems. We designed a dynamic
programming algorithm to compute the minimal number of cache hits while executing a de-
terministic sequence with 𝐾 preemptions. This technique to solve WCETs is consistent with
others classical solutions in the literature [20]. Still, we want to emphasize that designing the
best classical algorithm to solve the considered problem is not the point of our work: we are
focused on the portability of classical algorithms to the quantum framework.
   This paper is organized as follows: Section II provides a brief overview about cache memories
and preemptions. Then, Section III places the paper in the context of the state-of-the-art, and
presents our program model. In Section IV we propose a dynamic programming algorithm for
evaluating WCET. Section V then contains the quantum version of that algorithm. Lastly, in
Section VI we compare the complexity of the two algorithms and in Section VII we provide
some perspectives for future works.


2. Background on cache and preemption
Cache memories impact the variability of execution times, since the access time between an
element stored in the cache memories and an element which is not, can be up to two orders of
magnitude. As cache memories are limited in size, the hardware uses the history of previous
accesses in the cache to decide which currently stored elements should be replaced when a new




                                               67
one must be stored in the cache memory. For instance, LRU (Least Recently Used) cache policy
privileges, as replacement candidate, the oldest used line of memory in a set. In the general
case, when an access to an element in memory performed by a program is already in the cache,
we call it a “cache hit”, otherwise we say that a “cache miss” occurred. Cache misses impact on
the execution time because the missing element needs to be fetched from the main memory,
which induces additional delays.
   The advantage of preemptions is the possibility to make optimal utilization of the computing
power. In particular, in fully preemptive systems, as the one we present in this paper, the run-
ning task can be interrupted at any time by another task with higher priority and be resumed
to continue when all higher priority tasks have completed [6]. When the task is preempted,
the memory blocks corresponding to the task are usually considered as flushed from the cache
memory1 , between the time the task is preempted and the time the task resumes execution.
Therefore a substantial amount of time is spent to restore the previous content of the cache,
greatly increasing the task’s execution time [3]. In certain cases, preemption has not a great
impact because it happens at a moment of the program’s execution in which data stored into
the cache are not useful: for instance, before a cache miss. Still, in general, preemption dam-
ages program locality and therefore it causes a degradation of system predictability, making
WCETs not easy to characterize and predict [4] [5] [3] [6].


3. Deterministic memory access with preemptions
We consider programs that perform deterministic accesses to the memory, that can be inter-
rupted at any time, using the count of cache misses as a first proxy evaluation of the WCET
(or as the literature calls it, the Cache memory preemption delay). A “dual” approach has al-
ready been explored in the literature, with papers about applying static analysis techniques
to quantum algorithms to evaluate their performance and formally analyze their functional
properties [18, 19]. On the other hand, the applications of quantum computing to improve the
static analysis of programs has been only scarcely explored [1] and even lesser so is the issue
of static analysis of cache misses and WCET where only “quantum-inspired” classic algorithm
have been proposed [2]. So a lot of questions remain open about the applications and benefits
of quantum computing for software engineering issues [1]. Numerous research works exist in
the direction of improving classical polynomial algorithms with a quantum inspired approach,
allowing to gain a polynomial factor of complexity [7, 8, 9, 10, 11, 12, 13, 14]. In this paper, we
propose a dynamic programming algorithm for computing the minimal number of cache hits
and improve it into a quantum-classic hybrid version, leading to a lower complexity.
   As model case, we take into consideration the problem of evaluating the WCET of a deter-
ministic sequence of memory accesses, in case of preemptions. We denote the sequence of
memory accesses as 𝑎0 , 𝑎1 , … , 𝑎𝑁 −1 and we are supposed to know if each of them is a cache
miss or a cache hit in an execution without preemptions (this is done in linear time as a pre-


   1
     Whilst it is possible that, for a given set of tasks, preemption would preserve some cache lines, the situation
when one wants to calculate WCET of a task in the general case requires the hypothesis that no useful cache line
would be retained by a preemption.




                                                        68
processing step). We denote 𝑋 [𝑖] the 𝑖-th access of the sequence and
                                           {
                                            1 if cache miss
                                   𝑋 [𝑖] =
                                            0 if cache hit.

𝑃 is the set of possible contents of those memory accesses 𝑝0 , … , 𝑝𝑀−1 . The number of preemp-
tions interrupting the program is a fixed number 𝐾 . Preemption can happen at any time, this
leads to non-predictability of the model. We suppose also that when a preemption occurs, all
the content of the cache is flushed from it.


4. Our classical starting point
We designed a dynamic programming algorithm to compute the minimal number of cache hits
while executing a deterministic sequence with 𝐾 preemptions. The minimal number of cache
hits corresponds to the maximal number of cache misses.
   First at all, we scan the sequence of memory accesses, as it would be without any point of
preemption, and we store the information of being a cache hit or miss into an array 𝑋 : if the
considered access is cache miss 𝑋 [𝑖] = 1, 𝑋 [𝑖] = 0 otherwise. We compute also the 𝑁𝑖 [𝑝𝑡 ] > 𝑖,
the table of indexes of the first access to the object 𝑝𝑡 , after the 𝑖-th access, into the sequence
of memory access. If there is no other access to 𝑝𝑡 , we write ∞.
   We suppose to know the solution of the problem for all the sequence 𝑎𝑖+1 , ⋯ , 𝑎𝑁 −1 and the
number of preemptions 𝑘 ′ , where 0 < 𝑘 ′ < 𝐾 . In other words, it is known:

    • 𝑆(𝑖 + 1, 𝑝𝑡, 𝑘 ′ ): the smallest number of cache hits for the object 𝑝𝑡 and 𝑘 ′ preemptions;

    • 𝑌 (𝑖 + 1, 𝑘 ′ ) > 𝑖 + 1 : index of the next preemption in this solution.

    Algorithm 1 describes how to determine 𝑆(𝑖, 𝑝𝑡 , 𝑘 ′ ) and 𝑌 (𝑖, 𝑘 ′ ) from the previous data, in
case of cache miss (i.e if 𝑋 [𝑖] = 1)
    Now, Algorithm 2 describes how to determine 𝑆(𝑖, 𝑝𝑡 , 𝑘 ′ ) and 𝑌 (𝑖, 𝑘 ′ ) from the previous data,
in case of cache hit (i.e if 𝑋 [𝑖] = 0).
    After performing the algorithm we obtain the smallest number of cache hits for each object
𝑝𝑡 , when 𝐾 preemptions occur, in symbols 𝑆(0, 𝑝𝑡 , 𝐾 ). ∑𝑝𝑡 𝑆(0, 𝑝𝑡 , 𝐾 ) is the solution for the
problem of computing the smallest number of cache hits for the sequence of memory accesses.

4.1. Example of application of the classic algorithm
As example, we take into consideration a cache of size 2, with 𝐾 = 2 (i.e two preemptions
occur). The set of possible contents of the cache is 𝑃 = {𝑝0 , 𝑝1 , 𝑝2 } = {𝐴, 𝐵, 𝐶}. The sequence
of memory accesses is 𝑎0 𝑎1 𝑎2 𝑎3 𝑎4 = 𝐴𝐵𝐴𝐵𝐶. The first two accesses and the last one are cache
misses, while the third and fourth accesses are cache hits. It is simple to see that the worst
scenario is when the two points of preemption happen immediately before the two cache hits,
because in this case the content of the cache is flushed and 𝐴 and 𝐵 has to be restored into the
cache when re-called lately. In this scenario the algorithm returns 𝑆0 = 0 as minimal number




                                                  69
 Algorithm 1: Cache hits counter - cache miss case
  Result: The choice between A and B, for a fixed 𝑘 ′ , is the one that minimize
            ∑𝑘 ′ 𝑆(𝑖, 𝑝𝑡 , 𝑘 ′ )
  𝑖 index in the sequence
  0 < 𝑘 ′ < 𝐾 number of preemptions;
  case A no preemption
  for all 𝑝𝑡 in P
      for all 𝑘 ′
        if 𝑝𝑡 = 𝑎𝑖 then 𝑆(𝑖, 𝑎𝑖 , 𝑘 ′ ) = 1 + 𝑆(𝑖 + 1, 𝑎𝑖 , 𝑘 ′ )
        𝑆(𝑖, 𝑝𝑡 , 𝑘 ′ ) = 𝑆(𝑖 + 1, 𝑝𝑡 , 𝑘 ′ )
         𝑌 (𝑖, 𝑘 ′ ) = 𝑌 (𝑖 + 1, 𝑘 ′ )
        end
  case B preemption
  for all 𝑝𝑡 in P
      for all 𝑘 ′
        if 𝑋 [𝑁𝑖 [𝑝𝑡 ]] = 0 and 𝑌 (𝑖 + 1, 𝑘 ′ − 1) > 𝑁𝑖 [𝑝𝑡 ]
        then
         𝐵 = 𝐵1 = 𝑆(𝑖, 𝑝𝑡 , 𝑘 ′ ) = 𝑆(𝑖 + 1, 𝑝𝑡 , 𝑘 ′ − 1) − 1
         else
         𝐵 = 𝐵2 = 𝑆(𝑖, 𝑝𝑡 , 𝑘 ′ ) = 𝑆(𝑖 + 1, 𝑝𝑡 , 𝑘 ′ − 1)
         𝑌 (𝑖, 𝑘 ′ ) = 𝑖


of cache hits, meaning that all the accesses of the sequence are cache misses. If a preemption
happens before a cache miss it does not impact in the performances.
   After the algorithm performs the first scan of the sequence of memory accesses we have
𝑋 = 11001. The table 𝑁 is:

                                  i \t       0 (A)   1 (B)   2 (C)
                                  0          2       1       4
                                  1          2       3       4
                              𝑁 =
                                  2          ∞       3       4
                                  3          ∞       ∞       4
                                  4          ∞       ∞       ∞
  We give below the details only of the two first iterations, being the most representative. In
order to obtain the final result, it suffices to execute again the algorithm, until the sequence of
accesses is completed.




                                                70
Algorithm 2: Cache hits counter - cache hit case
 Result: The choice between A and B, for a fixed 𝑘 ′ , is the one that minimize
           ∑𝑘 ′ 𝑆(𝑖, 𝑝𝑡 , 𝑘 ′ )
 𝑖 index in the sequence
 0 < 𝑘 ′ < 𝐾 number of preemptions;
 case A no preemption
 for all 𝑝𝑡 in P
     for all 𝑘 ′
        if 𝑝𝑡 = 𝑎𝑖
       𝐴 = 𝑆(𝑖, 𝑎𝑖 , 𝑘 ′ ) = 1 + 𝑆(𝑖 + 1, 𝑎𝑖 , 𝑘 ′ )
       else
       𝐴 = 𝑆(𝑖, 𝑝𝑡 , 𝑘 ′ ) = 𝑆(𝑖 + 1, 𝑝𝑡 , 𝑘 ′ )
       𝑌 (𝑖, 𝑘 ′ ) = 𝑌 (𝑖 + 1, 𝑘 ′ )
 case B preemption
 for all 𝑝𝑡 in P
     for all 𝑘 ′
        if (𝑋 [𝑁𝑖 [𝑝𝑡 ]] = 0 and 𝑌 (𝑖 + 1, 𝑘 ′ − 1) > 𝑁𝑖 [𝑝𝑡 ]and 𝑝𝑡 ≠ 𝑎𝑖 )
        𝐵 = 𝐵1 = 𝑆(𝑖, 𝑝𝑡 , 𝑘 ′ ) = 𝑆(𝑖 + 1, 𝑝𝑡 , 𝑘 ′ − 1) − 1
        else
        𝐵 = 𝐵2 = 𝑆(𝑖, 𝑝𝑡 , 𝑘 ′ ) = 𝑆(𝑖 + 1, 𝑝𝑡 , 𝑘 ′ − 1)
        𝑆(𝑖, 𝑎𝑖 , 𝑘 ′ ) = 𝑆(𝑖 + 1, 𝑎𝑖 , 𝑘 ′ − 1)
        𝑌 (𝑖, 𝑘 ′ ) = 𝑖


  • a4=C
                                                    t \k’   0       1       2
                                                    0       0       0       0
                                                    (A)
                              𝑆(𝑖 = 4, 𝑝𝑡 , 𝑘 ′ ) = 1       0       0       0
                                                    (B)
                                                    2       0       0       0
                                                    (C)
                                                     k’     0   1       2
                                𝑌 (𝑖 = 4, 𝑘 ′ ) =
                                                            ∞   4       4

  • a3=B




                                                71
      Without preemption:

                                                      t \k’          0       1       2
                                                      0              0       0       0
                                                      (A)
                               𝐴 = 𝑆(𝑖 = 3, 𝑝𝑡, 𝑘 ) = 1
                                                 ′
                                                                     1       1       1
                                                      (B)
                                                      2              0       0       0
                                                      (C)

      With preemption:
                                                         t \k’       0       1       2
                                                         0           -       0       0
                                                         (A)
                               𝐵 = 𝑆(𝑖 = 3, 𝑝𝑡 , 𝑘 ′ ) =
                                                         1 (B)       -       1       1
                                                         2           -       0       0
                                                         (C)
      Here, note that "-" is equivalent to 0 for our purpose. Final table:

                                                       t \k’     0       1       2
                                                       0         0       0       0
                                                       (A)
                                 𝑆(𝑖 = 3, 𝑝𝑡 , 𝑘 ′ ) = 1         1       1       1
                                                       (B)
                                                       2         0       0       0
                                                       (C)

                                                         k’    0     1       2
                                    𝑌 (𝑖 = 3, 𝑘 ′ ) =
                                                               ∞     3       3

5. Going quantum
In this section we present the “quantum version” of the previous algorithm. We suppose to
already know the behaviour of the cache in case of program’s execution without preemptions
(i.e the array 𝑋 ) and the table 𝑁 , both deterministic. While searching for the minimal number
of cache miss, in the classic algorithm, we perform three nested loops:

    • over the sequence memory accesses (𝑁 iterations)

    • over all the possible objects that can be into the cache (𝑁 iterations)

    • over all the 𝑘 ′ with 0 < 𝑘 ′ < 𝐾 (𝐾 iterations)

To compute the solution (i.e. the number of cache hits corresponding to 𝐾 preemption and
the last memory access) the algorithm needs all the number of cache hits corresponding to 𝑘 ′




                                                    72
with 0 < 𝑘 ′ < 𝐾 . We reformulated the dynamic programming classic algorithm into an hybrid
quantum-classical algorithm in which the third loop (the one over the number of preemptions)
is suitable to be executed on a quantum computer.
     In particular, we will give below the details of the first case (when cache miss happens) of
our quantum algorithm, being the second case (when cache hit happens) similar. We build a
superposed state represented as |𝑘 ′ , 𝑆⟩ where 𝑘 ′ is the number of considered preemptions and
𝑆 the corresponding number of cache hits. In other words, for each number of preemptions
𝑘 ′ , it is known the minimum possible number of cache hits while performing the sequence
of memory accesses. Considering, for simplicity of notation, the case of our example, 𝑆 can
be either 0 or 1 so we build the superposed state ∑𝑘 ′ 𝑎𝑝 |𝑘 ′ , 0⟩ + 𝑏𝑝 |𝑘 ′ , 1⟩, for each 𝑝 ∈ 𝑃. We
recall that 𝑃 is the set of possible contents of the memory slots. Since each block (a 𝑘 ′ iteration)
is independent from the others, we can consider them one by one. Therefore we perform a
projection on the 𝑘 ′ , so we represent the qubits as 𝑞𝑝 = 𝑎𝑝 |0⟩ + 𝑏𝑝 |1⟩, with 𝑎𝑝 , 𝑏𝑝 ∈ {0, 1} and
𝑎𝑝 + 𝑏𝑝 = 1. If the considered memory access is a cache miss (i.e 𝑋 [𝑖] = 1), a preemption may
(case B) or may not (case A) occur before accessing it. When a preemption does not occur (case
A), we simply apply the identity matrix to the superposed state. When a preemption occurs
(case B) the situation is more complex and we need to define two different operators, 𝐺 and 𝐷.
Those operators are made of the sub-operators 02 , 𝐼2 , 𝑆2 , applied to the single blocks.

                                       0 0     1 0     0 1
                               02 =        𝐼 =     𝑆 =                                             (1)
                                      [0 0] 2 [0 1] 2 [1 0]

                                        ⎡02 𝐼2 02 ⋯             02 ⎤
                                        ⎢                          ⎥
                                         0 0 𝐼 ⋯                02 ⎥
                                      𝐷=⎢ 2 2 2
                                        ⎢⋮     ⋮  ⋮ ⋱           02 ⎥
                                        ⎢ 𝐼 2 02 02 02          02 ⎥
                                        ⎣                          ⎦
                               ⎡02 𝐼2 02 ⋯ 02 ⎤ ⎡𝑆2 02 02 02 ⎤
                               ⎢                  ⎥ ⎢                 ⎥
                               ⎢02 02 𝐼2 ⋯ 02 ⎥ ⎢02 𝑆2 03 02 ⎥
                          𝐺=                       ×
                               ⎢⋮    ⋮  ⋮ ⋱ 02 ⎥ ⎢ ⋮       ⋮ ⋱ ⋮⎥
                               ⎢ 𝐼2 02 02 02 02 ⎥ ⎢02 02 02 𝑆2 ⎥
                               ⎣                  ⎦ ⎣                 ⎦
  In order to maintain reversibility we need to compute all the more outputs of the algorithm
respect to the classical algorithm and keep track of them. Starting from the superposed state,
we will apply the identity operator to obtain 𝐴, we apply 𝐺 to obtain 𝐵1 and 𝐷 to obtain 𝐵2
  The first oracle 𝑂1 has to choose between 𝐵1 and 𝐵2, always block by block:

                        𝑂1 ∶ |𝑞1 , 𝑞2 , ⋯ , 𝑞𝑀 , 𝑞𝑀+1 , 𝑞𝑀+2 , ⋯ , 𝑞2𝑀 , 𝑐, 𝑦⟩ ⟶
                               |𝑞1 𝑓 ⊕ 𝑞𝑀+1 (𝑓 ⊕ 1), 𝑞2 𝑓 ⊕ 𝑞𝑀+2 (𝑓 ⊕ 1), ⋯ ,
                                   𝑞𝑀 𝑓 ⊕ 𝑞2𝑀 (𝑓 ⊕ 1), 𝑞1 (1 ⊕ 𝑓 ) ⊕ 𝑞𝑀+1 𝑓 ,
                                              𝑞2 (1 ⊕ 𝑓 ) ⊕ 𝑞𝑀+2 𝑓 ,
                                        ⋯ , 𝑞𝑀 (1 ⊕ 𝑓 ) ⊕ 𝑞2𝑀 𝑓 , 𝑐 ⊕ 𝑓 , 𝑦⟩
where:
                                         𝑞𝑖 = 𝑎𝑝 |0⟩ + 𝑏𝑝 |1⟩




                                                  73
and 𝑐 a boolean. We have also:
                                   𝑦 = 𝑐0 |∞⟩ + 𝑐1 |1⟩ + ⋯ + 𝑐𝑁 |𝑁 ⟩
where 𝑐0 , 𝑐1 , ⋯ , 𝑐𝑁 ∈ {0, 1} and 𝑐0 + 𝑐1 , ⋯ + 𝑐𝑁 = 1. Finally:
                                                      {
                                                        1 if condition is true,
                            𝑓 = 𝑓 = (𝑞1 , ⋯ , 𝑞2𝑀 ) =
                                                        0 otherwise
                                  with condition true if: 𝑦 > 𝑁𝑖 (𝑝𝑡 )
.
  Now, calling 𝐵 the result of the choice between 𝐵1 and 𝐵2, we need to choose between 𝐴
and 𝐵, picking the option that minimizes the number of cache hits. The oracle 𝑂2 is an oracle
choosing between A and B.
                       𝑂2 ∶       |𝑞1 , 𝑞2 , ⋯ , 𝑞𝑀 , 𝑞𝑀+1 , 𝑞𝑀+2 , ⋯ , 𝑞2𝑀 , 𝑐⟩ ⟶
                                   |𝑞1 𝑓 ⊕ 𝑞𝑀+1 (𝑓 ⊕ 1), 𝑞2 𝑓 ⊕ 𝑞𝑀+2 (𝑓 ⊕ 1), ⋯ ,
                                     𝑞𝑀 𝑓 ⊕ 𝑞2𝑀 (𝑓 ⊕ 1), 𝑞1 (1 ⊕ 𝑓 ) ⊕ 𝑞𝑀+1 𝑓 ,
                               𝑞2 (1 ⊕ 𝑓 ) ⊕ 𝑞𝑀+2 𝑓 , ⋯ , 𝑞𝑀 (1 ⊕ 𝑓 ) ⊕ 𝑞2𝑀 𝑓 , 𝑐 ⊕ 𝑓 ⟩
    where
                                           𝑞𝑖 = 𝑎𝑝 |0⟩ + 𝑏𝑝 |1⟩
and 𝑐 is a boolean. Then, we have:
                                                    {
                                                     1     if condition is true,
                          𝑓 = 𝑓 = (𝑞1 , ⋯ , 𝑞2𝑀 ) =
                                                     0     otherwise
                       with condition true if: 𝑞1 + ⋯ + 𝑞𝑀 < 𝑞𝑀+1 + ⋯ + 𝑞2𝑀
where "+" is the usual sum over qubits, executable by a Quantum Adder [15], [16].
 In Fig. 1 it is represented the quantum algorithm’s flow.

5.1. Example of application of the quantum algorithm
In this section we give a glimpse of the functioning of our algorithm though a simple example.
We suppose to have a sequence of memory accesses of length 𝑁 = 3: 𝐴𝐵𝐵. We have 𝑃 = {𝐴, 𝐵},
then 𝑀 = 2. The choice of this simplified example comes from the fact that the number of
qubits needed to perform the quantum algorithm increases quickly with the number of memory
accesses. That being said, with such a low 𝑁 and consequently low 𝐾 , it is not possible to
appreciate the advantage of the quantum algorithm in terms of complexity. We suppose that
the cache has one line. This means that in the best scenario the number of cache hits is 1, while
in the worst scenario, if a preemption occurs before the third access, the number of cache hits
is 0. Finally, we suppose 𝐾 = 1. We can build the deterministic tables 𝑁 and 𝑌 :
                                        i \t          0 (A)     1 (B)
                                        0             2         1
                                    𝑁 =
                                        1             2         ∞
                                        2             ∞         ∞




                                                     74
Figure 1: Quantum circuit for a block (i.e. for a fixed 𝑘 ′ ). Oracle 𝑂1 chooses between 𝐵1 and 𝐵2, while
𝑂2 chooses between 𝐴 and 𝐵, where 𝐵 is the answer of 𝑂1 .


We build the superposition representing all possible number of cache hits and all possible num-
ber of preemption |𝑘 ′ , 𝑆⟩ with 𝑘 ′ = {0, 1} and 𝑆 = {0, 1} for one object in 𝑃:

                               𝛼0 |0, 0⟩ + 𝛼1 |0, 1⟩ + 𝛼2 |1, 0⟩ + 𝛼3 |1, 1⟩ .

    In this superposition we have that either 𝛼𝑖 = 0 either 𝛼𝑖+1 = 0, with 𝑖 even. Then,

                                             ⎡0       0   1   0⎤
                                             ⎢                 ⎥
                                               0      0   0   1⎥
                                           𝐷=⎢
                                             ⎢1       0   0   0⎥
                                             ⎢0       1   0   0⎥
                                             ⎣                 ⎦
    and

                                             ⎡0       0   0   1⎤
                                             ⎢                 ⎥
                                               0      0   1   0⎥
                                           𝐺=⎢
                                             ⎢0       1   0   0⎥
                                             ⎢1       0   0   0⎥
                                             ⎣                 ⎦
.
   In order to build a quantum circuit, we can express the matrices 𝐷 and 𝐺 as boolean expres-
sion on the qubits 𝑎, 𝑏:
                                   𝐷 ∶ |𝑎, 𝑏⟩ ⟶ |𝑁 𝑂𝑇𝑎, 𝑏⟩
and
                                   𝐺 ∶ |𝑎, 𝑏⟩ ⟶ |𝑁 𝑂𝑇𝑎, 𝑁 𝑂𝑇 𝑏⟩




                                                    75
Figure 2: Choice between 𝐵1 and 𝐵2




Figure 3: 𝐷 and 𝐺 operators


  Now, the oracle 𝑂2 has to choose between the case 𝐴 and the case 𝐵, where 𝐵 is the result of
the first oracle 𝑂1 .
                        case A 𝛼0 |0, 0⟩ + 𝛼1 |0, 1⟩ + 𝛼2 |1, 0⟩ + 𝛼3 |1, 1⟩
                          case B 𝛼0′ |0, 0⟩ + 𝛼1′ |0, 1⟩ + 𝛼2′ |1, 0⟩ + 𝛼3′ |1, 1⟩
  The answer of the 𝑂2 oracle is 𝐴 if 𝛼1 + 𝛼3 < 𝛼1′ + 𝛼3′ , 𝐵 otherwise.

5.2. Post-processing
After applying the previous quantum-algorithm, for each block (i.e. for each 𝑘 ′ ), we have a
superposed state ∑𝑘 ′ |𝑘 ′ , 𝑆0 ⟩ where 𝑆0 is the minimal number of cache hits if 𝑘 ′ preemptions
occur and 0 < 𝑘 ′ < 𝐾 . From this superposed state we need to extrapolate the 𝑆0 corresponding
to 𝐾 preemption. The idea is to use a Grover-style searching algorithm over the superposed
state to isolate the solution (i.e 𝑆0 for 𝑘 ′ = 𝐾 ). In particular, we expect to have as input for the
oracle the superposition:

                              0 |0, 0⟩ + 0 |0, 1⟩ + ⋯ + 𝛼0 |0, 𝑆0 ⟩ + ⋯
                              +0 |𝑘 ′ , 𝑁 − 1⟩ + 0 |𝑘 ′ , 0⟩ + 0 |𝑘 ′ , 1⟩ + ⋯
                              +𝛼𝑘 ′ |𝑘 ′ , 𝑆0 ⟩ + ⋯ + 0 |𝑘 ′ , 𝑁 − 1⟩ + ⋯
                              +0 |𝐾 , 0⟩ + 0 |𝐾 , 1⟩ + ⋯ +
                              +𝛼𝐾 |𝐾 , 𝑆0 ⟩ + ⋯ + 0 |𝐾 , 𝑁 − 1⟩




                                                    76
We suppose then to build an oracle 𝑂𝐺 that selects the qubits with "index" corresponding to
𝐾 . We suppose that 𝑂𝐺 implements, in a Grover-style, an Amplitude Amplifier that amplifies
coefficients 𝛼𝑖 and leaves to zero the zeros coefficients, see Fig.4. If at the end of the algorithm
the result is |𝑥, 𝑦⟩ where 𝑥 ≠ 𝐾 , it means that the searching algorithm didn’t worked and we
should apply again the algorithm.
   In Fig. 5 we can see that the oracle takes as input superposed state of the results of the circuit
for each 𝑘 ′ in Fig. 1 and returns the minimal number of cache hits in the sequence of memory
access in case of 𝐾 preemptions.


6. Complexity analysis
In order to compute the final solution, at each iteration of the first loop, we create a table of
dimensions 𝑀 × 𝐾 , containing the number of the cache hits, for each item in 𝑃 and each 𝑘 ′ ,
with 0 < 𝑘 ′ ≤ 𝐾 . Each iteration of the second loop is independent from the others, meaning
that we can compute any row of the next table independently from the others rows. Then,
the complexity of "updating" a row (most inner loop) is 𝑂(𝐾 ). Therefore the dynamic pro-
gramming classical algorithm has a complexity of 𝑂(𝑀 × 𝑁 × 𝐾 ) (see Fig 6a). In the context of
porting classical applications to quantum computing, we designed a “hybrid quantum-classical
version” of the previous algorithm, whose output is the number of cache hits corresponding
to 𝐾 preemption. We consider a row in one table as a superposition of states representing all
the numbers of preemptions 𝑘 ′ and all the possible values of the number of cache hits. In this
way, the complexity of the act of "updating" a row becomes 𝑂(1) (see Fig. 6b). Therefore the
complexity of the resulting quantum algorithm becomes 𝑂(𝑀 × 𝑁 ). As a drawback, we need to
post-process the result we obtained at the previous step: superpositions of dimension 𝐾 × 𝑈 ,
where 𝑈 < 𝑁 is the maximum number of cache hits. However, we only need the value of
the number of cache hits when 𝐾 preemptions occur (i.e. only one value per row in the last
set of rows). Therefore, for each row, we can use Grover’s algorithm to extract the value of
interest
√        in the
            √ associated√ superposition. This means that this post-processing has complexity
  𝐾 × 𝑈 < 𝐾 × 𝑁 < 𝑁 2 = 𝑁 . Hence, the complexity of the whole quantum algorithm is
decreased to 𝑂(𝑁 2 + 𝑁 ) compared to 𝑂(𝑁 3 ) for the classical algorithm.


7. Conclusions and perspectives
In this paper, as a first step to deal with WCET-related problems by means of quantum-classic
hybrid algorithms, we worked on a highly simplified program model since we just considered
single-path programs, yet in the presence of arbitrary preemptions. A natural perspective
would be to attempt to generalize this approach to more complex program models allowing
for some non-determinism in the control-flow. However, such a generalisation would require
a careful control of the size of the quantum state representing the cache hit counters array in
order to avoid a complexity blow up at the post-processing amplification step.
   Also, in this paper, we have ported a polynomial-time dynamic programming algorithm to
the quantum framework. In essence, most such algorithms follow the same regular pattern of
several nested loops updating an array data structure with the final result obtained in only one




                                                 77
Figure 4: Amplitude amplification performed by the oracle 𝑂𝐺 . The coefficient 𝛼, the one correspond-
ing to the solution |𝐾 , 𝑆0 ⟩, is amplified while the coefficients 0s, corresponding to the non-solutions,
are not modified by the amplification. In the figure we can see how the 𝛼 is at first reversed and then
doubled, leading to an amplification, in a Grover-style fashion.


of the entries of the last array. As such, the approach in this paper, consists in turning the inner
loop of our algorithm in a “quantum parallel for” associated to a Grover-style amplification on
the single result of interest. As a consequence, the approach in this paper may potentially be
generalized to other (exponential-time or polynomial-time) dynamic programming algorithms




                                                   78
Figure 5: Post-processing circuit. The oracle 𝑂𝐺 takes as input the answer of 𝑂2 for each block, after
𝑁 application of the circuit in Fig.1. It returns as final answer the number of cache hits 𝑆𝑂 for the
considered memory accesses sequence, in case of 𝐾 preemptions.


in order to derive (polynomial-only) quantum speedups.


References
 [1] R. Hall,"A Quantum Algorithm for Software Engineering Search." 2009.
 [2] S. Louise, "A First Step Toward Using Quantum Computing for Low-Level WCETs Esti-
     mations." 2019.
 [3] C.-G.Lee, J.Hahn, Y.-M.Seo, S.L.Min, R.Ha,S.Hong, C.Y.Park, M. Lee, and C. S. Kim, “Anal-
     ysis ofcache-related preemption delay infixed-priority preemptive scheduling.” 1998.
 [4] H. Ramaprasad, F. Mueller, “Tightening the bounds on feasible pre-emption points.” 2006.
 [5] H. Ramaprasad, F. Mueller, “Bounding worst-case response timefor tasks with non-
     preemptive regions.” 2008.
 [6] G. Buttazzo, M. Bertogna, G. Yao, "Limited Preemptive Scheduling for Real-Time Systems.
     A Survey." 2013
 [7] A. Manju, M.J. Nigam, "Applications of quantum inspired computational intelligence: a
     survey".
 [8] S. B. Ramezani, A. Sommers, H. K. Manchukonda, S. Rahimi and A. Amirlatifi, "Machine
     Learning Algorithms in Quantum Computing: A Survey," 2020.
 [9] E. Tang, "A quantum-inspired classical algorithm for recommendation systems," 2019.
[10] H. Talbi, A. Draa and M. Batouche, "A new quantum-inspired genetic algorithm for solving
     the travelling salesman problem," 2004.




                                                 79
                                         (a) Classic algorithm




                                        (b) Quantum algorithm

Figure 6: In orange: the number of cache hits, for one object in 𝑃 and all 𝑘 ′


[11] D. Jethwani and F. Le Gall and Sanjay K. Singh, "Quantum-Inspired Classical Algorithms
     for Singular Value Transformation," 2020.
[12] G. Ripoll, J. Jose, "Quantum-inspired algorithms for multivariate analysis: from interpo-
     lation to partial differential equations," 2021.
[13] N. Chia and H. Lin and C. Wang. , "Quantum-inspired algorithms for multivariate analysis:
     from interpolation to partial differential equations," 2021.
[14] A. Gilyen and Z. Song and E. Tang, "An improved quantum-inspired algorithm for linear
     regression," 2020.
[15] G. Florio, D. Picca, "Quantum implementation of elementary arithmetic operations," 2004.
[16] A.V. Cherkas and S.A. Chivilikhin, "Quantum adder of classical numbers," 2016.
[17] R. Wilhelm, J. Engblom, A. Ermedahl, N. Holsti, S. Thesing, D. Whalley, G. Bernat, C. F.d
     Heckmann, T. Mitra, F. Mueller, I. Puaut, P. Puschner, J. Staschulat, and . Stenström. " The
     worst-case execution-time problem—overview of methods and survey of tools." 2008
[18] A. J. Abhari, S. Patil, D. Kudrow, J. Heckey, A. Lvov, F. T. Chong et M. Martonosi, "Scaffcc
     : a framework for compilation and analysis of quantum computing programs." 2014
[19] A. Facon, S. Guilley, M. Lec’Hvien, A. Schaub et Y. Souissi, "Detecting cache-timing vul-
     nerabilities in post-quantum cryptography algorithms." 2018
[20] J. Cavicchio, C. Tessler and N. Fisher, "Minimizing cache overhead via loaded cache blocks
     and preemption placement." 2015




                                                   80