Comparison of steepest descent method and conjugate gradient method Oleksandra Osadcha Zbigniew Marszaek Faculty of Applied Mathematics Institute of Mathematics Silesian University of Technology Silesian University of Technology Gliwice, Poland Gliwice, Poland Email:oleosa338@student.polsl.pl Email: Zbigniew.Marszalek@polsl.pl Abstract—This paper illustrates comparison of two methods, Occurring in the equation 2 vector r(k) is called residuum of which are used for solving systems of linear equations . The system and takes the following form: aim of the research is to analyze, which method is faster solve this equations and how many iteration required each method for r(k) = b − Ax(k) , (3) solving. This paper presents some ways to deal with this problem. Also there are analyzed two methods. The first of them minimizes or the standard deviation of the next iteration of the solution, the other constructs orthogonal vectors and is theoretically finite r(k+1) = r(k) − ck Ar(k) = (I − ck A)r(k) . (4) method. In iterative process this vector can be determined in each Index Terms—conjugate gradient method, steepest descent iteration of the algorithm as a suitable combination of the method, comparison, analysis previous values, as described is in next formula: r(k+1) = r(k) − ck Ar(k) (5) I. I NTRODUCTION Computer algorithms are important methods for numerical The coefficients ck iterations occurring in the equation 3 are processing. In all implementations it is important to make determined on the basis of the following equality: them more efficient and decrease complexity but without kx∗ − x(k+1) kB = inf kx∗ − x(k) − ck r(k) kB (6) loss of efficiency. Cryptographic algorithms are one of most ck important methods for computer science therefore efficiency In contrast, norm k · kB is defined as and complexity of these is crucial for optimal processing [14]. Knowledge systems demand fast and precise methods def √ kxkB = xT Bx (7) for information processing [4], [6]. Similarly decision support models need devoted techniques for lower complexity [12]. Matrix B occurring in formula 7 is symmetric and positive This publication present comparison of steepest descent definite. At the same time it fulfills the condition the alterna- method and conjugate gradient method. These methods are tion of the matrix A, i.e.: A·B = b·A. It is not difficult to show used for solving systems of linear equations. In our publica- that searched coefficients ck are described by the equation: tion, we analyze, which method is faster and how many itera- tion required each method. First, we describe these methods, (r(k) , B(x∗ − x(k) )) ck = . (8) than we compare them and make conclusions. (r(k) , Br(k) ) II. T HE STEEPEST DESCENT METHOD Just for some matrix B, you can determine the value of B(x∗ − The steepest descent method formulated by Stiefel. We will x(k) ). For example, if B = A(p) , p ∈ N then there is following present the mathematical description of the method of steepest equality: descent and we will make implementation in the form of code. Consider the system linear equations in the general form A(p) (x∗ − x(k) ) = A(p−1) (b − Ax(k) ) = A(p−1) r(k) . (9) Ax = B (1) We suppose that matrix A is symmetric and positive definite. Also it can be described in specific cases. For example, for With x∗ we denote solution of the system equations 1, i.e.: p = 1, method is called steepest descent, while p = 2 method Ax∗ = b. The methods of gradient constructed a sequence of minimizes the Euclidean norm, and we call it the method of successive approximations to determine the solutions x∗ with least residuum, where using formula: (r(k) , Ar(k) )) x (k+1) =x (k) + ck r (k) (2) ck = . (10) (Ar(k) , Ar(k) ) Copyright © 2017 held by the authors 1 III. C ONJUGATE GRADIENT METHOD With the k-dimensional minimilizing point in formula 11 are We use conjugate gradient method to solve the system of allowed all vectors from z being in the form of linear equations given in the form of z = xk + Rk u, u ∈ Rk , Ax = b, (11) where r = rk + ARk u is residual vector. Hence, where A is a positive definite matrix with n × n sizes.As a 1 T −1 1 1 result of operation of this method we obtain a sequence of F (z) = r A r = rkT A−1 rk + uT RkT rk + uT RkT ARk u. vectors starting from a vector initial x0 2 2 2 Differentiating our functional by u, functional adopts a mini- x0 → x1 → · · · → xn , mum for which stops after accurate calculations at most n steps, so xk+1 = xk + Rk u e, we get the required solution in the form of xn+1 = x. Since the calculations performed occurs rounding error value, e ∈ Rk is solution to the equation where u so the value of xn+1 is generally not very accurate result. Therefore, in this method, as in all other iterative methods RkT ARk u e = −RkT rk . (13) we conduct further steps xk → xk+1 , respectively, until we get the exact solution. Since the amount of work at each step We introduce the following designations of the workload of the matrix multiplication A by the vector, and therefore this method is not suitable for use in the matrix   band and solid. In contrast, as much as possible, this method u e is suitable for a medium-sized array diluted. The general  0  ) R := (r1 , . . . , rk , . . . , rn ), vk :=  .    idea of our method is based on the fact that the functional  ..  ∈ Rn F : Rn → R which has a form: n−k 0 1 1 1 F (z) = (b−Az)T A−1 (b−Az) = z T bT z−Az+ bT A−1 b, 2 2 2 Then we get Rvk = Rk u e, and is minimized with accurate solution x, thus 0 = F (x) = minn F (z). z∈R xk+1 = xk + Rk u e = xk + Rvk , (14) This equality follows from the fact that the matrix A is positive definite, and thus the matrix A−1 also has the same property. rk+1 = rk + ARk u e = rk + ARvk . (15) While corresponding with the vector z residualny vector r : Az = −b, disappears only z := x. It reminds us that the From the above fact follows that: 1. RkT rk+1 = 0 by (12), method of steepest descent in which a string of x1 → x2 → after that we have ortogonal vector rk+1 to vectors r1 , . . . , rk . . . . is found with 1-dimensional minimize the functional F in Thus the right-hand side of equation (12) is as follows: the direction of the gradient:   ) 0 k−1 xk+1 : F (xk+1 ) = min F (xk + urk ),   . 0  ..  u )k−1   where  ..    0    .  rk := DF (xk )T = b − Axk . −RkT rk =  0  , so RT ARvk =    .      ∗   However, in the case of conjugate gradient method, we must        .  ∗  ..  to carry out the minimization of the k dimensions: ∗ xk+1 : F (xk+1 ) = min F (xk + u1 r1 + · · · + uk rk ), (16) u1 ,...,uk ri := b − Axi f or i ≤ k. 2. If vector rk 6= 0, then vectors r1 , . . . , rk are linearly (12) independent, but matrix RkT ARk is positively determined. It’s very easy to compute xk+1 . In addition, vectors ri are Thereafter, RkT rk ≥ 0 hence to the equation (11) and (14) ortogonal, and hence lineary independent provided that rk 6= 0. Becouse, in space R, there are no more than n in depended vectors, so in exact calculations, there is the smallest k ≤ n+1 eT RkT ARk u 0 2. Thus, from equality (18) implies that c1 = 0. 5. If vector vk = (v1 , . . . , vk , 0, . . . , 0)rk 6= 0 has k We assume, that for some j ≥ 1, which satisfying the components, where rk 6= 0 is negative. From formula (14) following inequality 1 ≤ j ≤ k − 2, there is equality follows next equality c1 = c2 = · · · = cj−1 = 0. Therefore, by using formula(18) k X as before, we obtain: rk+1 − rk = ARvk = vj Arj , j=1 rjT rk+1 = 0 ⇒ 0 = rkT Arj + cj rjT rj = then = rkT (linear combination of vectors rj+1 , rj , . . . , r1 ) + k−1 X cj rjT rj = Ark = ((rk+1 − rk ) − vj Arj )/vk . j=1 cj rjT rj ⇒ cj = 0, vectors r1 , . . . , rk are linearly inde- Applying induction we get description for Ark in following pendent, as is apparent from the fact that r0 6= 0. In this form connection, in particular, we have rj 6= 0. Therefore, all solid cj , where j = 1, . . . , k − 2 disappear in equations (17) and Ark = qk+1 (rk − rk+1 ) − ek (rk−1 − rk )− (18). Due to the fact, we get k−2 X (18) − cj (rj − rj+1 ), (e1 := 0) T rT Ark−1 rT rk j=1 rk−1 rk+1 = 0 ⇒ ek = − Tk = qk T k rk−1 rk−1 rk−1 rk−1 where qk+1 , ek are constans and additionally qk+1 = −1/vk > 0. By solving the above equation due tork+1 we get the rkT Ark rkT rk+1 = 0 ⇒ qk+1 = − ek equality: rkT rk This gives us an algorithm conjugate gradient method. k−2 X Initial data: x1 is intended, r1 := b − Ax1 , e1 := 0. rk+1 = rk +[−Ark +ek (rk −rk−1 )+ cj (rj+1 −rj )]/qk+1 . For k = 1, 2, . . . , n: ]1) If rk = 0, stop; xk is sought for j=1 solution. 2) Otherwise, we must to calculate: (19) If we take into account orthogonality of vectors rkT Ark r1 , . . . , rk , rk+1 , then we can easily demonstrate that all qk+1 := − ek , rkT rk 3 Tab. I C OMPARISON OF METHODS Conjugate gradient method Steepest descent method N n Iteration Time in ms Time in ti Iteration Time in ms Time in ti 5 25 13 1 1983 128 0 1780 10 100 33 36 68281 405 34 64634 15 225 49 286 530934 837 272 505238 20 400 65 1680 3110493 1413 1621 3002271 25 625 79 6499 12032438 2093 6329 11716167 30 900 94 20053 37122809 2909 19606 36295497 rk+1 := rk + (−Ark + ek (rk − rk−1 ))/qk+1 , [5] Knyazev, Andrew V.; Lashuk, Ilya Steepest Descent and Conjugate Gra- dient Methods with Variable Preconditioning. SIAM Journal on Matrix xk+1 := xk + (−rk + ek (xk − xk−1 ))/qk+1 , Analysis and Applications. 29 (4): 1267. doi:10.1137/060675290 [6] Z. Marszałek, “Novel Recursive Fast Sort Algorithm,” in Communications rT rk+1 in Computer and Information Science, vol. 639, pp. 344–355, 2016, DOI: ek+1 := qk+1 k+1T . 10.1007/978-3-319-46254-7 27. rk rk [7] G. Capizzi, F. Bonanno, C. Napoli, “Hybrid neural networks architectures for SOC and voltage prediction of new generation batteries storage”, IV. T HE COMPARISON OF METHODS International Conference on Clean Electrical Power (ICCEP), pp. 341- 344, IEEE, 2015, DOI: 10.1109/ICCEP.2011.6036301. A. Analysis of efficiency [8] C. Napoli, F. Bonanno, G. Capizzi, “An hybrid neuro-wavelet approach for long-term prediction of solar wind”, Proceedings of the International There is a big difference between presented methods. The Astronomical Union, vol. 6, no. 274, pp. 153-155, IAU, 2015, DOI: steepest descent method is several times faster than conjugate 10.1017/S174392131100679X. gradient method. But, if we look on iteration of this methods [9] Yvan Notay Flexible Conjugate Gradients, 2000, SIAM Journal on Scientific Computing. 22 (4): 1444. doi:10.1137/S1064827599362314 in Table.1, we see, that conjugate method converge after less [10] D. Połap, M. Woźniak, , C. Napoli, and E. Tramontana, “Is swarm number of iteration. For example, for n = 20, number of intelligence able to create mazes?” International Journal of Electronics iterations of conjugate gradient method equals 65, and achieve and Telecommunications, vol. 61, No. 4, pp. 305–310, 2015, DOI: 10.1515/eletel-2015-0039. the desired accuracy - 1413. [11] C. Napoli, G. Pappalardo, E. Tramontana, “An agent-driven semantical identifier using radial basis neural networks and reinforcement learning”, B. Analysis of time XV Workshop Dagli Oggetti agli Agenti (WOA), CEUR-WS, vol. 1260, 2015, DOI: 10.13140/2.1.1446.7843. On graph of time, we can see that, for example, for n = 15 [12] D. Połap, M. Woźniak, “Flexible Neural Network Architecture for time in ms of conjugate gradient method equals 286 and time Handwritten Signatures Recognition” International Journal of Electronics in ti of steepst descent method equals 271. After that, we and Telecommunications, vol. 62, no. 2, pp. 197–202, 2016, DOI: 10.1515/eletel-2016-0027. could see, that when number of n increases there is substantial [13] Jan A. Snymam Practical Mathematical Optimization: An Introduction difference between methods. For example, when n = 20, time to Basic Optimization Theory and Classical and New Gradient-Based in ms of conjugate gradient method is equals 1680, time in ti Algorithms, 2005, Springer Publishing, ISBN 0-387-24348-8 [14] J. Toldinas, R. Damasevicius, A. Venckauskas, T. Blazauskas, and J. of achieve the desired accuracy - 1621 iterations. Ceponis, “Energy Consumption of Cryptographic Algorithms in Mobile Devices” ELEKTRONIKA IR ELEKTROTECHNIKA, vol. 20, pp. 158– V. C ONCLUSIONS 161, 2014. We have compared two method implemented to solve sys- tems of linear equations. The steepest descent method is faster method, because it solve equations in less amount of time. Conjugate gradient method is slower, but more productive, because, it converges after less iterations. So, we can see, that one method can be used, when we want to find solution very fast and another can be converge to maximum in less iteration. R EFERENCES [1] O. Axelsson, G. Lindskog On the Rate of Convergence of the Precondi- tioned Conjugate Gradient Method, ,,Numerische Mathematik”, 1986, nr 48, s. 499-523 [2] Mordecai Avriel Nonlinear Programming: Analysis and Methods, 2003, Dover Publishing, ISBN 0-486-43227-0 [3] D. Połap, M. Woźniak, , C. Napoli, and E. Tramontana, “Real-Time Cloud-based Game Management System via Cuckoo Search Algorithm” International Journal of Electronics and Telecommunications, vol. 61, No. 4, pp. 333–338, 2015, DOI: 10.1515/eletel-2015-0043. [4] P. Artiemjew, Stability of Optimal Parameters for Classifier Based on Simple Granules of Knowledge, Technical Sciences, vol. 14, no. 1, pp. 57-69. UWM Publisher, Olsztyn 2011. 4 Fig. 2. Block Diagram of the Steepest Descent Method 5 Fig. 3. Block Diagram of the Conjugate Gradient Method 6