=Paper= {{Paper |id=Vol-1623/papermp17 |storemode=property |title=A Variant of the Multi-Step Bundle Method |pdfUrl=https://ceur-ws.org/Vol-1623/papermp17.pdf |volume=Vol-1623 |authors=Rashid Yarullin |dblpUrl=https://dblp.org/rec/conf/door/Yarullin16 }} ==A Variant of the Multi-Step Bundle Method== https://ceur-ws.org/Vol-1623/papermp17.pdf
       A Variant of the Multi-Step Bundle Method⋆

                                          Rashid Yarullin

      Kazan (Volga Region) Federal University, Institute of Computational Mathematics
        and Information Technologies, Kremlyovskaya str. 35, 420008 Kazan, Russian
                                {yarullinrs}@gmail.com
                                   http://www.kpfu.ru



        Abstract. A method from a class of bundle methods is proposed to solve an
        unconstrained optimization problem. In this method an epigraph of the ob-
        jective function is approximated by the set which is formed on the basis of the
        convex quadratic function. This method is characterized in that iteration points
        are constructed in terms of information obtained in the previous steps of the
        minimization process. Computational aspects of the proposed method are dis-
        cussed, convergence of this one is proved, and convergence rate of the iteration
        process is obtained.

        Keywords: a bundle method, an epigraph, multi-step methods, approximation
        sets, convergence rate


1     Introduction
A class of bundle methods is quite wide (e.g. [2–6]). Such methods use multi-step
approach of constructing iteration points to solve a convex programming problem.
Namely, the next approximation is formed in term of prehistory of the solution process
by minimizing an auxiliary convex quadratic function. Taking into account this feature
the given methods profitably differ from one-step methods by constructing anti gully
trajectory of the iteration points and good convergence rate.
    In this paper the method is proposed for solving a convex programming problem
which belongs to the mentioned class. The suggested method also applies multi-step
technique of constructing approximations. Moreover, note that unlike the famous bun-
dle methods the solution of the auxiliary quadratic programming problem is obtained
in the proposed method by the formula, and this fact is convinient to use in practical
implementations of the method.


2     Problem Setting
The method is proposed for solving the following problem:
⋆
    The research was supported by the Russian Fundamental Research Foundation within the
    project 16-31-60014.
Copyright ⃝
          c by the paper’s authors. Copying permitted for private and academic purposes.
In: A. Kononov et al. (eds.): DOOR 2016, Vladivostok, Russia, published at http://ceur-ws.org
316       R. Yarullin



                                        min{f (x) : x ∈ Rn },                            (1)
where f (x) is a continuously differentiable convex function defined in an n-dimensional
Euclidian space Rn , and the gradient of the function f (x) satisfies Lipschitz continuous
condition |f ′ (x) − f ′ (y)| ≤ L||x − y|| for all x, y ∈ Rn with the parameter L > 0.
   Let f ∗ = min{f (x) : x ∈ Rn }, X ∗ = {x ∈ Rn : f (x) = f ∗ } ̸= ∅, epi(f ) = {(x, δ) ∈
Rn+1 : δ ≥ f (x)}, K = {0, 1, . . .}. By |B| denote the cardinality of the set B ⊂ Rn .


3      The Bundle Method
A sequence {xk }, k ∈ K, is constructed by the proposed method as follows.
   0. Define numbers l > 0, γ > 0 and r ∈ K such that r ≥ 1. Select any point v ∈ Rn .
Assign η0 = l, k = 0 and B0 = {v}.
   1. Choose
                             xk = argmin{f (y) : y ∈ Bk },                        (2)
and find a point zk as a solution of the problem

                                       min{φk (x) : x ∈ Rn },                            (3)

where
                                                                    γ
                        φk (x) = f (xk ) + ⟨f ′ (xk ), x − xk ⟩ +     ||x − xk ||2 .     (4)
                                                                    2
Determine φk = φk (zk ).
   2. Choose a number αk (b) ∈ (0, 1] for each b ∈ Bk such that
                                                 (                        )
          (uk (b), µk (b)) = (b, θk (b)) + αk (b) (zk , φk ) − (b, θk (b)) ∈ epi(f ),    (5)

where
                                         θk (b) = f (b) + ηk .                           (6)
.
      3. Find a point
                                 wk = argmin{f (uk (b)) : b ∈ Bk }.                      (7)
      4. If the inequality |Bk | < r is defined, then construct the next set
                                                   ∪
                                       Bk+1 = Bk {wk }.                                  (8)

Otherwise find a point vk = argmax{f (a) : a ∈ Bk }, and determine
                                               ∩
                             Bk+1 = Bk \ {vk } {wk }.                                    (9)
      5. Assign
                                                       l
                                          ηk+1 =            ,                           (10)
                                                   (k + 1)2
increment k by one, and go to Step 1.
    Firstly, lets represent some properties of the suggested method.
                                                             A Multi-Step Bundle Method           317

Remark 1. The point v which is described at Step 0 in the algorithm is the initial
iteration point. Unfortunately, there is no any general approach of constructing the
initial iteration point in nonlinear programming methods. But in the process of solving
practical optimization problems there are some informations to construct a region
which approximate the set of solutions. Consequently, it is obviously to select the
initial iteration point v as close as possible to the mentioned region.
Remark 2. Note that as well as the multi-step methods the proposed method uses
prehistory of the solution process. Namely, according to (3), (5), (7) and Step 4 of the
algorithm the next approximation xk+1 , k > r, is selected from the set Bk+1 which is
constructed on the basis of the last r > 0 iteration points of the sequence {xi }, i ∈ K.
    In the bundle methods each iteration point is obtained by minimizing the auxiliary
quadratic function constructed on the basis of the model of the objective function. Since
this model consists of several cutting planes, then it is nessesary to use various numerical
methods for solving quadratic programming problems. In the proposed method the
model of the objective function contains only one cutting plane. Thus, the solution of
the auxiliary quadratic problem can be found by the formula. This result is represented
in the following statement.
Lemma 1. Suppose that sequences {zk }, {φk }, k ∈ K, are constructed by the proposed
method. Then equalities
                                          f ′ (xk )
                              zk = xk −             ,                           (11)
                                               γ
                                                     ||f ′ (xk )||2
                                    φk = f (xk ) −                                               (12)
                                                           2γ
are defined for all k ∈ K.
Proof. Note that the function φk (x) is differentiable and strongly convex. Consequently,
problem (3) has unique solutions for all k ∈ K. Lets compute partial derivatives of the
                                                                           ′
function φk (x) which have the following form: ∂φ            k (x)
                                                            ∂x[i] = f (xk )[i] + (x[i] − xk [i])γ = 0,
i = 1, n. Hence, equation (11) is defined. Further, in view of (4), (11) we have φk =
                        ′              ′                         ′
φk (zk ) = f (xk ) − ||f (xγk )|| + ||f (x        = f (xk ) − ||f (x
                                 2              2                         2
                                           k )||                     k )||
                                         2γ                        2γ        . The lemma is proved.
Lemma 2. Suppose that the sequence {xk }, k ∈ K, is constructed by the suggested
method. Then the inequality
                                                  (            ) αk (xk ) ′
           f (xk+1 ) ≤ f (uk (xk )) ≤ f (xk ) + ηk 1 − αk (xk ) −        ||f (xk )||2            (13)
                                                                   2γ
is satisfied for all k ∈ K.
Proof. In accordance with ways (8), (9) of constructing the set Bk+1 the inclusion
wk ∈ Bk+1 is defined for all k ∈ K. Hence, in view of (2), (7) the expression f (xk+1 ) ≤
f (wk ) ≤ f (uk (xk )) is determined. Further,
                                             ( taking into ) account (5), (6), (12) we have
f (uk (xk )) ≤ µk (xk ) = θk (xk ) + αk (xk ) φk − θk (xk ) = f (xk ) + ηk + αk (xk )f (xk ) −
αk (xk )   ′
                                                             (          ) αk (xk ) ′
   2γ ||f (xk )|| −αk (xk )f (xk )−αk (xk )ηk = f (xk )+ηk 1−αk (xk ) − 2γ ||f (xk )|| .
                 2                                                                           2

The Lemma is proved.
318     R. Yarullin

    Before proving convergence of the proposed method lets construct the parameter
αk (xk ), k ∈ K, by the following rule.

Lemma 3. If
                                         (φk , zk ) ∈ epi(f ),                                  (14)
then αk (xk ) = 1. Otherwise αk (xk ) is selected so that the equation

                                        f (uk (xk )) = µk (xk )                                 (15)

is defined. Then there exists a constant c > 0 such that

                                      αk (xk ) ≥ c       ∀k ∈ K.                                (16)

Proof. Lets fix numbers ε ∈ (1, 2) and i ≥ 0 such that

                                            L2−i
                                                 ≤ 2 − ε,                                       (17)
                                             γ

where L > 0 is Lipshitz constant of the gradient f ′ (x).
    Since f (x) is a continuously differentiable functions, and its gradient satisfies Lip-
shitz condition, then in view of ηk (1 − 2−i ) ≥ 0 for all k ∈ K we give f (xk ) −
         −i                                                −i                          −i
f (xk − 2γ f ′ (xk )) + ηk (1 − 2−i ) ≥ f (xk ) − f (xk − 2γ f ′ (xk )) ≥ ⟨f ′ (xk ), 2γ f ′ (xk )⟩ −
L 2−i ′            2−i  ′          L −2i−1                   −i−1                    −i

2 || γ f (xk )|| = γ ||f (xk )|| − γ 2 2
                2               2
                                           ||f ′ (xk )||2 = 2 γ ||f ′ (xk )||2 ε ≥ 22γ ||f ′ (xk )||2
for all k ∈ K.
    Hence, by putting

                                        2−i ′
                         ūk = xk −        f (xk ) = xk + 2−i (zk − xk ),                       (18)
                                         γ

                                        2−i ′
      µ̄k = f (xk ) + ηk (1 − 2−i ) −      ||f (xk )||2 = θk (xk ) + 2−i (φk − θk (xk )),       (19)
                                        2γ
we have f (ūk ) ≤ µ̄k , consequently,

                                         (ūk , µ̄k ) ∈ epi(f ).                                (20)

    Now lets prove that there exist a constant c > 0 such that inequality (16) is defined.
Note that according to conditions of the lemma the parameter αk (xk ) is constructed
by 2 ways. Firstly, if inclusion (14) is determined for some k ∈ K, then αk (xk ) = 1.
Secondly, let the parameter αk (xk ) is defined in accordance with (15). In this case
according to (5) the point (uk (xk ), µk (xk )) is situated in the intersection of the segment
[(zk , φk ), (xk , θk (xk ))] with the border of the set epi(f ), consequently, we have

                                 (uk (xk ), µk (xk )) ∈
                                                      / intepi(f ).                             (21)

Moreover, in view of (18), (19) we get

                               (ūk , µ̄k ) ∈ [(zk , φk ), (xk , θk (xk ))].                    (22)
                                                              A Multi-Step Bundle Method       319

Now taking into account (20)-(22) lets suppose that

                                           2−i > αk (xk ).                                    (23)

Then there exists a number βk > 0 such that
                                               (                                      )
            (ūk , µ̄k ) = (xk , θk (xk )) + βk (uk (xk ), µk (xk )) − (xk , θk (xk )) .      (24)

Hence, in view of (18), (19), (23) and using equality (5) while (uk (b), µk (b)) = (uk (xk ), µk (xk ))
                            −i
the expression βk = αk2(xk ) > 1 is defined. Further, from (24) it follows that (uk (xk ), µk (xk )) =
                     (                              )                    (                              )
(xk , θk (xk )) + β1k (ūk , µ̄k ) − (xk , θk (xk )) = (ūk , µ̄k ) + β¯k (xk , θk (xk )) − (ūk , µ̄k ) ,
where β̄k = (1 − β1k ) ∈ (0, 1). Then from Theorem 3 [7, p. 153] it follows that
(uk (xk ), µk (xk )) ∈ intepi(f ) which contradicts to condition (21). Thus, assumption
(23) is wrong, consequently, αk (xk ) ≥ 2−i . Now taking into account all cases of con-
struction of the parameter αk (xk ) for all k ∈ K we have αk (xk ) ≥ c > 0. The theorem
is proved.

Remark 3. If inclusion (14) is not satisfied for some k ∈ K, then (uk (xk ), µk (xk ))
should be found as a boundary point of the set epi(f ) by solving one-dimensional
equation (15). Note that such equation is also solved in embedding methods [1] to
construct cutting hyperplanes.

Theorem 1. Suppose that the sequence {xk }, k ∈ K, is constructed by the proposed
                                  ∑∞ of Lemma 3, the set Mη (x0 ) = {x ∈ Rn : f (x) ≤
method in accordance with conditions
f (x0 )+η} is bounded, where η = k=0 ηk . Then the sequence {xk }, k ∈ K, is bounded,
and the following equality takes place lim f (xk ) = f ∗ . Moreover, convergence rate
                                             k∈K

                                                c0
                              f (xk ) − f ∗ ≤      ,   k ∈ K,      k ≥ 1,                     (25)
                                                k
is determined, where c0 > 0.

Proof. In accordance with (13) and Lemma 3 the inequality
                                                            c
                            f (xk+1 ) ≤ f (xk ) + ηk −        ||f ′ (xk )||2 .                (26)
                                                           2γ
is defined. Hence,
                                f (xk+1 ) ≤ f (xk ) + ηk ,      k ∈ K.                        (27)
Since f (xk ) ≥ f ∗ > −∞, then from Lemma 2 [7, p. 87] and (10), (27) it follows that
there exists a limit lim f (xk ) ≥ f ∗ , consequently, lim (f (xk ) − f (xk+1 )) = 0. Summing
                     k∈K                               k∈K
                                                                        ∑m−1
inequalities (27) from 0 to m − 1 by k we have f (xm ) ≤ f (x0 ) + k=0 ηk ≤ f (x0 ) + η.
Hence, {xk } ⊂ Mη (x0 ), and the sequence {xk }, k ∈ K, is bounded. Further, in view of
(26) we get lim f ′ (xk ) = 0, consequently, lim f (xk ) = f ∗ .
              k∈K                                  k∈K
    Now lets obtain convergence rate of the iteration process. For all x∗ ∈ X ∗ we have

               0 ≤ f (xk ) − f ∗ ≤ ||f ′ (xk )||||xk − x∗ || ≤ d||f ′ (xk )||,   k ∈ K,       (28)
320     R. Yarullin

where d ≥ diamMη (x0 ). Suppose ak = f (xk ) − f ∗ . Then from (26), (28) it follows that
ak − ak+1 = f (xk ) − f (xk+1 ) ≥ 2γ
                                   c
                                     ||f ′ (xk )||2 − ηk ≥ 2γd
                                                            c
                                                               a2k − ηk . Then in view of (10)
                                                             a2
                            c } we have ak+1 ≤ ak − A + k2 for all k ∈ K. Hence,
and by putting A = max{l, 2Ld                         k    A

from Lemma 5 [7, p. 89] under conditions I0 = K and I1 = ∅ convergence rate (25) is
proved.


References
1. Bulatov, V.P.: Embedding Methods in Optimization Problems. Nauka, Novosibirsk (1977)
   [in Russian].
2. Kiwiel, K.C.: An ellipsoid trust region bundle method for nonsmooth convex minimization.
   SIAM Journal on Control and Optimization 27 (4), 737–757 (1989).
3. Kiwiel, K.C.: Exact penalty functions in proximal bundle methods for constrained convex
   nondifferentiable minimization. Math. Program. 52 (2), 285–302 (1991).
4. Lemarechal C., Nemirovskii A., Nesterov Y. : New variants of bundle methods Math.
   Program. 69, 111–148 (1995).
5. Lemarechal C., Strodiot J.J., Bihain A.: On a bundle algorithm for nonsmooth optimiza-
   tion. Nonlinear Programming, 4, 245–281 (1981).
6. Sagastizabal, C.: Composite proximal bundle method. Math. Program. 140 (1), 189–233,
   (2013).
7. Vasiliev, F.P.: Optimization Methods. Factorial Press, Moscow (2002) [in Russian].