Comparison of steepest descent method and
                 conjugate gradient method
                           Oleksandra Osadcha                                           Zbigniew Marszaek
                   Faculty of Applied Mathematics                                     Institute of Mathematics
                   Silesian University of Technology                             Silesian University of Technology
                            Gliwice, Poland                                                Gliwice, Poland
                   Email:oleosa338@student.polsl.pl                             Email: Zbigniew.Marszalek@polsl.pl


   Abstract—This paper illustrates comparison of two methods,           Occurring in the equation 2 vector r(k) is called residuum of
which are used for solving systems of linear equations . The            system and takes the following form:
aim of the research is to analyze, which method is faster solve
this equations and how many iteration required each method for                                  r(k) = b − Ax(k) ,                      (3)
solving. This paper presents some ways to deal with this problem.
Also there are analyzed two methods. The first of them minimizes        or
the standard deviation of the next iteration of the solution, the
other constructs orthogonal vectors and is theoretically finite                    r(k+1) = r(k) − ck Ar(k) = (I − ck A)r(k) .          (4)
method.
                                                                           In iterative process this vector can be determined in each
  Index Terms—conjugate gradient method, steepest descent               iteration of the algorithm as a suitable combination of the
method, comparison, analysis                                            previous values, as described is in next formula:

                                                                                             r(k+1) = r(k) − ck Ar(k)                   (5)
                      I. I NTRODUCTION
   Computer algorithms are important methods for numerical              The coefficients ck iterations occurring in the equation 3 are
processing. In all implementations it is important to make              determined on the basis of the following equality:
them more efficient and decrease complexity but without
                                                                                  kx∗ − x(k+1) kB = inf kx∗ − x(k) − ck r(k) kB         (6)
loss of efficiency. Cryptographic algorithms are one of most                                           ck
important methods for computer science therefore efficiency
                                                                        In contrast, norm k · kB is defined as
and complexity of these is crucial for optimal processing
[14]. Knowledge systems demand fast and precise methods                                             def √
                                                                                              kxkB = xT Bx                              (7)
for information processing [4], [6]. Similarly decision support
models need devoted techniques for lower complexity [12].               Matrix B occurring in formula 7 is symmetric and positive
   This publication present comparison of steepest descent              definite. At the same time it fulfills the condition the alterna-
method and conjugate gradient method. These methods are                 tion of the matrix A, i.e.: A·B = b·A. It is not difficult to show
used for solving systems of linear equations. In our publica-           that searched coefficients ck are described by the equation:
tion, we analyze, which method is faster and how many itera-
tion required each method. First, we describe these methods,                                      (r(k) , B(x∗ − x(k) ))
                                                                                           ck =                          .              (8)
than we compare them and make conclusions.                                                            (r(k) , Br(k) )
            II. T HE STEEPEST DESCENT METHOD                            Just for some matrix B, you can determine the value of B(x∗ −
  The steepest descent method formulated by Stiefel. We will            x(k) ). For example, if B = A(p) , p ∈ N then there is following
present the mathematical description of the method of steepest          equality:
descent and we will make implementation in the form of code.
Consider the system linear equations in the general form
                                                                             A(p) (x∗ − x(k) ) = A(p−1) (b − Ax(k) ) = A(p−1) r(k) .    (9)
                                Ax = B                       (1)
We suppose that matrix A is symmetric and positive definite.            Also it can be described in specific cases. For example, for
With x∗ we denote solution of the system equations 1, i.e.:             p = 1, method is called steepest descent, while p = 2 method
Ax∗ = b. The methods of gradient constructed a sequence of              minimizes the Euclidean norm, and we call it the method of
successive approximations to determine the solutions x∗ with            least residuum, where
using formula:
                                                                                                     (r(k) , Ar(k) ))
                    x   (k+1)
                                =x   (k)
                                           + ck r   (k)
                                                             (2)                              ck =                    .                (10)
                                                                                                     (Ar(k) , Ar(k) )

Copyright © 2017 held by the authors
                                                                    1
            III. C ONJUGATE GRADIENT METHOD                             With the k-dimensional minimilizing point in formula 11 are
   We use conjugate gradient method to solve the system of              allowed all vectors from z being in the form of
linear equations given in the form of
                                                                                                z = xk + Rk u,       u ∈ Rk ,
                             Ax = b,                        (11)
                                                                        where r = rk + ARk u is residual vector. Hence,
where A is a positive definite matrix with n × n sizes.As a
                                                                                   1 T −1   1                        1
result of operation of this method we obtain a sequence of              F (z) =      r A r = rkT A−1 rk + uT RkT rk + uT RkT ARk u.
vectors starting from a vector initial x0                                          2        2                        2
                                                                        Differentiating our functional by u, functional adopts a mini-
                    x0 → x1 → · · · → xn ,
                                                                        mum for
which stops after accurate calculations at most n steps, so                                          xk+1 = xk + Rk u
                                                                                                                    e,
we get the required solution in the form of xn+1 = x.
Since the calculations performed occurs rounding error value,                 e ∈ Rk is solution to the equation
                                                                        where u
so the value of xn+1 is generally not very accurate result.
Therefore, in this method, as in all other iterative methods                                       RkT ARk u
                                                                                                           e = −RkT rk .                   (13)
we conduct further steps xk → xk+1 , respectively, until we
get the exact solution. Since the amount of work at each step           We introduce the following designations
of the workload of the matrix multiplication A by the vector,
and therefore this method is not suitable for use in the matrix                                                             
band and solid. In contrast, as much as possible, this method                                                             u
                                                                                                                          e
is suitable for a medium-sized array diluted. The general                                                                0  )
                                                                          R := (r1 , . . . , rk , . . . , rn ),   vk :=  . 
                                                                                                                            
idea of our method is based on the fact that the functional                                                              ..           ∈ Rn
F : Rn → R which has a form:                                                                                                      n−k
                                                                                                                          0
         1                          1                1
F (z) = (b−Az)T A−1 (b−Az) = z T bT z−Az+ bT A−1 b,
         2                          2                2                  Then we get Rvk = Rk u
                                                                                             e, and
is minimized with accurate solution x, thus
                   0 = F (x) = minn F (z).
                                     z∈R                                                    xk+1 = xk + Rk u
                                                                                                           e = xk + Rvk ,                  (14)
This equality follows from the fact that the matrix A is positive
definite, and thus the matrix A−1 also has the same property.                             rk+1 = rk + ARk u
                                                                                                          e = rk + ARvk .                  (15)
While corresponding with the vector z residualny vector r :
Az = −b, disappears only z := x. It reminds us that the                 From the above fact follows that: 1. RkT rk+1 = 0 by (12),
method of steepest descent in which a string of x1 → x2 →               after that we have ortogonal vector rk+1 to vectors r1 , . . . , rk .
. . . is found with 1-dimensional minimize the functional F in          Thus the right-hand side of equation (12) is as follows:
the direction of the gradient:
                                                                                                                                
                                                                                                                               )
                                                                                                                           0       k−1
            xk+1 : F (xk+1 ) = min F (xk + urk ),                                                                         .
                                                                                      0                                   .. 
                                     u                                                     )k−1                              
where                                                                               ..                                 
                                                                                                                          0 
                                                                                                                              
                                                                                    . 
                 rk := DF (xk )T = b − Axk .                            −RkT rk =  0           , so RT ARvk = 
                                                                                                                             
                                                                                                                                        .
                                                                                                                            
                                                                                                                             
                                                                                                                           ∗
                                                                                        
However, in the case of conjugate gradient method, we must                                                                 
                                                                                                                             
                                                                                                                          . 
                                                                                      ∗                                   .. 
to carry out the minimization of the k dimensions:
                                                                                                                           ∗
   xk+1 : F (xk+1 ) = min F (xk + u1 r1 + · · · + uk rk ),                                                                         (16)
                        u1 ,...,uk
                                     ri := b − Axi f or i ≤ k.          2. If vector rk 6= 0, then vectors r1 , . . . , rk are linearly
                                                             (12)       independent, but matrix RkT ARk is positively determined.
It’s very easy to compute xk+1 . In addition, vectors ri are            Thereafter, RkT rk ≥ 0 hence to the equation (11) and (14)
ortogonal, and hence lineary independent provided that rk 6=
0. Becouse, in space R, there are no more than n in depended
vectors, so in exact calculations, there is the smallest k ≤ n+1                              eT RkT ARk u
                                                                                            0<u               uT RkT rk ,
                                                                                                         e = −e
such that rk is equal zero. Vector xk corresponds to number
                                                                        where last component in vector u e, and therefore k-th compo-
rk . Now we well show step k xk → rk+1 of our method. The
                                                                        nent of the vector vk is negative.
matrix dimensions n × k we denote Rk ,
                                                                        3. In view of equations (13) and (15) for i < k we have next
                    Rk := (r1 , r2 , . . . , rk ).                      equality:


                                                                    2
                                                                                                    Fig. 1. Graph of time in ms
     (xi+1 − xi )T A(xk+1 − xk ) = viT [RT ARvk ] =
                                             )
                                          0      k−1
                                         .. 
                                         . 
                                            
                                         0 
                                                                  (17)
           = (∗  · ·
              | {z } · ∗, 0, . . . , 0) 
                                        
                                             
                                                    = 0,
                   i                     ∗ 
                                            
                                         . 
                                         .. 
                                          ∗
it is mean, that directions pi := xi+1 − xi are A-coupled:
pTi Apk = 0 for i < k.
4. In view of equations (12), (1.1.13) and (1.1.14) we have
next equality

              1 T                1
  F (xk+1 ) =  r    A−1 rk+1 = rkT A−1 rk + u
                                            eT RkT rk +
              2 k+1              2
          1 T T                    1 T T                                        the constants cj disappear. Thus, we obtain formulas for qk+1
         + u         e = F (xk ) − u
            e Rk ARk u               e Rk ARk u
                                              e=                                i ek . Then, for k − 2 ≥ 1 we get
          2                        2
                    1
        = F (xk ) − (xk+1 − xk )T A−1 (xk+1 − xk ).                              r1T rk+1 = 0 ⇒ 0 = r1T Ark + c1 r1T r1 = rkT (Ar1 ) + c1 r1T r1 .
                    2
                                                                                                                                             (20)
This formula explains the extend of decrease value of
                                                                                   By using the formula (17), we can conclude that vector
functional F in k-th step xk → xk+1
                                                                                Ar1 also is a linear combination of vectors r1 and r2 , then
                                                                                rkT Ar1 = 0 providing that k > 2. Thus, from equality (18)
                                                                                implies that c1 = 0.
   5. If vector vk = (v1 , . . . , vk , 0, . . . , 0)rk 6= 0 has k
                                                                                We assume, that for some j ≥ 1, which satisfying the
components, where rk 6= 0 is negative. From formula (14)
                                                                                following inequality 1 ≤ j ≤ k − 2, there is equality
follows next equality
                                                                                c1 = c2 = · · · = cj−1 = 0. Therefore, by using formula(18)
                                           k
                                           X                                    as before, we obtain:
                rk+1 − rk = ARvk =                 vj Arj ,
                                           j=1                                    rjT rk+1 = 0 ⇒ 0 = rkT Arj + cj rjT rj =
  then                                                                             = rkT (linear combination of vectors rj+1 , rj , . . . , r1 ) +
                                         k−1
                                         X                                      cj rjT rj =
              Ark = ((rk+1 − rk ) −            vj Arj )/vk .
                                         j=1                                      cj rjT rj ⇒ cj = 0, vectors r1 , . . . , rk are linearly inde-
Applying induction we get description for Ark in following                      pendent, as is apparent from the fact that r0 6= 0. In this
form                                                                            connection, in particular, we have rj 6= 0. Therefore, all solid
                                                                                cj , where j = 1, . . . , k − 2 disappear in equations (17) and
            Ark = qk+1 (rk − rk+1 ) − ek (rk−1 − rk )−                          (18). Due to the fact, we get
                          k−2
                          X                                         (18)
                      −         cj (rj − rj+1 ),      (e1 := 0)
                                                                                      T                    rT Ark−1        rT rk
                          j=1                                                        rk−1 rk+1 = 0 ⇒ ek = − Tk       = qk T k
                                                                                                           rk−1 rk−1     rk−1 rk−1
where qk+1 , ek are constans and additionally qk+1 = −1/vk >
0. By solving the above equation due tork+1 we get the                                                                  rkT Ark
                                                                                             rkT rk+1 = 0 ⇒ qk+1 =              − ek
equality:                                                                                                                rkT rk
                                                                                This gives us an algorithm conjugate gradient method.
                                            k−2
                                            X                                   Initial data: x1 is intended, r1 := b − Ax1 , e1 := 0.
rk+1 = rk +[−Ark +ek (rk −rk−1 )+                  cj (rj+1 −rj )]/qk+1 .       For k = 1, 2, . . . , n: ]1) If rk = 0, stop; xk is sought for
                                            j=1                                 solution. 2) Otherwise, we must to calculate:
                                                            (19)
If we take into account orthogonality of vectors                                                               rkT Ark
r1 , . . . , rk , rk+1 , then we can easily demonstrate that all                                     qk+1 :=           − ek ,
                                                                                                                rkT rk


                                                                            3
                                                                      Tab. I
                                                             C OMPARISON OF METHODS

                                                  Conjugate gradient method                  Steepest descent method
                               N      n     Iteration   Time in ms Time in ti         Iteration   Time in ms     Time in ti
                                5    25        13            1            1983           128            0           1780
                               10    100       33            36          68281           405           34          64634
                               15    225       49           286         530934           837           272         505238
                               20    400       65           1680       3110493          1413          1621        3002271
                               25    625       79           6499       12032438         2093          6329       11716167
                               30    900       94          20053       37122809         2909         19606       36295497


         rk+1 := rk + (−Ark + ek (rk − rk−1 ))/qk+1 ,                           [5] Knyazev, Andrew V.; Lashuk, Ilya Steepest Descent and Conjugate Gra-
                                                                                    dient Methods with Variable Preconditioning. SIAM Journal on Matrix
         xk+1 := xk + (−rk + ek (xk − xk−1 ))/qk+1 ,                                Analysis and Applications. 29 (4): 1267. doi:10.1137/060675290
                                                                                [6] Z. Marszałek, “Novel Recursive Fast Sort Algorithm,” in Communications
                                  rT rk+1                                           in Computer and Information Science, vol. 639, pp. 344–355, 2016, DOI:
                      ek+1 := qk+1 k+1T   .                                         10.1007/978-3-319-46254-7 27.
                                    rk rk                                       [7] G. Capizzi, F. Bonanno, C. Napoli, “Hybrid neural networks architectures
                                                                                    for SOC and voltage prediction of new generation batteries storage”,
              IV. T HE COMPARISON OF METHODS                                        International Conference on Clean Electrical Power (ICCEP), pp. 341-
                                                                                    344, IEEE, 2015, DOI: 10.1109/ICCEP.2011.6036301.
A. Analysis of efficiency                                                       [8] C. Napoli, F. Bonanno, G. Capizzi, “An hybrid neuro-wavelet approach
                                                                                    for long-term prediction of solar wind”, Proceedings of the International
   There is a big difference between presented methods. The                         Astronomical Union, vol. 6, no. 274, pp. 153-155, IAU, 2015, DOI:
steepest descent method is several times faster than conjugate                      10.1017/S174392131100679X.
gradient method. But, if we look on iteration of this methods                   [9] Yvan Notay Flexible Conjugate Gradients, 2000, SIAM Journal on
                                                                                    Scientific Computing. 22 (4): 1444. doi:10.1137/S1064827599362314
in Table.1, we see, that conjugate method converge after less                   [10] D. Połap, M. Woźniak, , C. Napoli, and E. Tramontana, “Is swarm
number of iteration. For example, for n = 20, number of                             intelligence able to create mazes?” International Journal of Electronics
iterations of conjugate gradient method equals 65, and achieve                      and Telecommunications, vol. 61, No. 4, pp. 305–310, 2015, DOI:
                                                                                    10.1515/eletel-2015-0039.
the desired accuracy - 1413.                                                    [11] C. Napoli, G. Pappalardo, E. Tramontana, “An agent-driven semantical
                                                                                    identifier using radial basis neural networks and reinforcement learning”,
B. Analysis of time                                                                 XV Workshop Dagli Oggetti agli Agenti (WOA), CEUR-WS, vol. 1260,
                                                                                    2015, DOI: 10.13140/2.1.1446.7843.
   On graph of time, we can see that, for example, for n = 15                   [12] D. Połap, M. Woźniak, “Flexible Neural Network Architecture for
time in ms of conjugate gradient method equals 286 and time                         Handwritten Signatures Recognition” International Journal of Electronics
in ti of steepst descent method equals 271. After that, we                          and Telecommunications, vol. 62, no. 2, pp. 197–202, 2016, DOI:
                                                                                    10.1515/eletel-2016-0027.
could see, that when number of n increases there is substantial                 [13] Jan A. Snymam Practical Mathematical Optimization: An Introduction
difference between methods. For example, when n = 20, time                          to Basic Optimization Theory and Classical and New Gradient-Based
in ms of conjugate gradient method is equals 1680, time in ti                       Algorithms, 2005, Springer Publishing, ISBN 0-387-24348-8
                                                                                [14] J. Toldinas, R. Damasevicius, A. Venckauskas, T. Blazauskas, and J.
of achieve the desired accuracy - 1621 iterations.                                  Ceponis, “Energy Consumption of Cryptographic Algorithms in Mobile
                                                                                    Devices” ELEKTRONIKA IR ELEKTROTECHNIKA, vol. 20, pp. 158–
                         V. C ONCLUSIONS                                            161, 2014.
   We have compared two method implemented to solve sys-
tems of linear equations. The steepest descent method is faster
method, because it solve equations in less amount of time.
Conjugate gradient method is slower, but more productive,
because, it converges after less iterations. So, we can see, that
one method can be used, when we want to find solution very
fast and another can be converge to maximum in less iteration.

                            R EFERENCES
[1] O. Axelsson, G. Lindskog On the Rate of Convergence of the Precondi-
    tioned Conjugate Gradient Method, ,,Numerische Mathematik”, 1986, nr
    48, s. 499-523
[2] Mordecai Avriel Nonlinear Programming: Analysis and Methods, 2003,
    Dover Publishing, ISBN 0-486-43227-0
[3] D. Połap, M. Woźniak, , C. Napoli, and E. Tramontana, “Real-Time
    Cloud-based Game Management System via Cuckoo Search Algorithm”
    International Journal of Electronics and Telecommunications, vol. 61,
    No. 4, pp. 333–338, 2015, DOI: 10.1515/eletel-2015-0043.
[4] P. Artiemjew, Stability of Optimal Parameters for Classifier Based on
    Simple Granules of Knowledge, Technical Sciences, vol. 14, no. 1, pp.
    57-69. UWM Publisher, Olsztyn 2011.


                                                                            4
Fig. 2. Block Diagram of the Steepest Descent Method


                         5
Fig. 3. Block Diagram of the Conjugate Gradient Method


                          6