A new algorithm of nonlinear conjugate gradient method with strong convergence

Shi, Zhen-Jun; Guo, Jinhua

Abstract

The nonlinear conjugate gradient method is a very useful technique for solving large scale minimization problems and has wide applications in many fields. In this paper, we present a new algorithm of nonlinear conjugate gradient method with strong convergence for unconstrained minimization problems. The new algorithm can generate an adequate trust region radius automatically at each iteration and has global convergence and linear convergence rate under some mild conditions. Numerical results show that the new algorithm is efficient in practical computation and superior to other similar methods in many situations.

unconstrained optimization; nonlinear conjugate gradient method; global convergence; linear convergence rate

A new algorithm of nonlinear conjugate gradient method with strong convergence^* * The work was supported in part by NSF CNS-0521142, USA.

Zhen-Jun Shi^I,II; Jinhua Guo^II

^ICollege of Operations Research and Management, Qufu Normal University, Rizhao, Sahndong 276826, P.R. China. E-mails: zjshi@qrnu.edu.cn; zjshi@umd.umich.edu

²Department of Computer and Information Science, University of Michigan, Dearborn, Michigan 48128-1491, USA. E-mail: jinhua@umd.umich.edu

ABSTRACT

The nonlinear conjugate gradient method is a very useful technique for solving large scale minimization problems and has wide applications in many fields. In this paper, we present a new algorithm of nonlinear conjugate gradient method with strong convergence for unconstrained minimization problems. The new algorithm can generate an adequate trust region radius automatically at each iteration and has global convergence and linear convergence rate under some mild conditions. Numerical results show that the new algorithm is efficient in practical computation and superior to other similar methods in many situations.

Mathematical subject classification: 90C30, 65K05, 49M37.

Key words: unconstrained optimization, nonlinear conjugate gradient method, global convergence, linear convergence rate.

1 Introduction

Consider an unconstrained minimization problem

where Rⁿ is an n-dimensional Euclidean space and f : Rⁿ ® R is a continuously differentiable function.

When n is very large (for example, n > 10⁶) the related problem is called large scale minimization problem. In order to solve large scale minimization problems, we need to design special algorithms that avoid the high storage and computation cost of some matrices.

The conjugate gradient method is a suitable approach to solving large scale minimization problems. For strictly convex quadratic objective functions, the conjugate gradient method with exact line searches has the finite convergence property. If the objective function is not a quadratic or the inexact line searches are used, the conjugate gradient method has no finite convergence property or even no global convergence property [6, 20].

When the conjugate gradient method is used to minimize non-quadratic objective functions, the related algorithm is called the nonlinear conjugate gradient method [17, 18]. There has been much literature to study the nonlinear conjugate gradient methods [3, 4, 5]. Meanwhile, some new nonlinear conjugate gradient methods have appeared [8, 11].

The conjugate gradient method has the form

where x₀ is an initial point, a_k is a step size, and d_k can be taken as

in which g_k = Ñ f(x_k). Different b_k will determine different conjugate gradient methods. Some famous formulae for b_k are as follows.

Although some conjugate gradient methods have good numerical performance in solving large scale minimization problems, they have no global convergence in some situations [6]. We often have two questions. Whether can we construct a conjugate gradient method that has both global convergence and good numerical performance in practical computation? Whether can we design a conjugate gradient method that is suitable to solve ill-conditioned minimization problems (the Hessian of objective functions at a stationary point is ill-conditioned)?

Yuan and Stoer [19] studied the conjugate gradient method on a subspace and obtained a new conjugate gradient method. In their algorithm, the search direction was taken from the subspace span{g_k, d_k_-1} at the kth iteration (k > 1), i.e.,

where g_k and b_k are parameters.

Motivated by [19], we can apply the trust region technique to the conjugate gradient method and propose a new algorithm of nonlinear conjugate gradient methods. This new algorithm has both global convergence and good numerical performance in practical computation. Theoretical analysis and numerical results show that the proposed algorithm is promising and can solve some ill-conditioned minimization problems.

The paper is organized as follows. Section 1 is the introduction. In Section 2, we introduce the new conjugate gradient method. In Sections 3 and 4, we analyze the global convergence and convergence rate of the new method. Numerical results are reported in Section 5.

2 New Algorithm

We first assume that

(H1) The objective function f (x) has a lower bound on Rⁿ.

(H2) The gradient function g(x) = Ñ f(x) of the objective function f(x) is Lipschitz continuous on an open convex set B that contains the level set L(x₀) = {x| f(x) < f(x₀)}, i.e., there exists L > 0 such that

Lemma 2.1. Assume that (H2) holds and x_k, x_k + d_k Î B, then

Proof. The proof is easy to obtain from mean value theorem and here is omitted.

Algorithm (A)

Step 0. Choose parameters µ Î (0, 1), r Î (0, 1) and M₀ >> L₀ > 0; given initial point x₀Î Rⁿ, set k := 0.

Step 1. If ||g_k|| = 0 then stop else go to Step 2.

Step 2.x_k₊₁ = x_k + d_k(a_k), where a_k is the largest one in {1, r, r², ¼,} such that

in which

and (g, b)^TÎ R² is a solution to

Step 3.

or

Step 4. Set k := k + 1 and goto Step 1.

Remark 2.1. In Algorithm (A), the main task is to solve (15). In fact, if k = 0 then the problem (15) has a solution g = a/L_k. If k > 1 then the problem (5) has the solution

where y_k = (g¢, b¢)^T is a solution of the equations in two variables

Moreover L_k is an approximation to the Lipschitz constant L of the gradient of the objective function. If we set b º 0 then Algorithm (A) is very similar to BB method [1, 7]. However, Algorithm (A) has global convergence.

Lemma 2.2. If (H2) holds then

In fact, by the Cauchy-Schwartz inequality, we have

and thus, L_k ₊₁ should be in the interval

Generally, we take

in practical computation.

3 Global convergence

Lemma 3.2. Assume that (H1) and (H2) hold, then

Proof. Set _k(a) = -gg_k such that ||_k(a)|| = a||g_k||/L_k, then _k(a) is a feasible solution to (15). By noting a Î (0,1] and d_k(a) being an optimal solution to (15), we have

Theorem 3.1. Assume that (H1) and (H2) hold. Algorithm (A) generates an infinite sequence {x_k}. Then

Proof. It is easy to obtain from (H1), (H2) and Lemmas 2.1, 2.2 and 3.1, that

This shows that if a < then we have > µ.

Therefore, there exists h₀ > 0 such that a_k> h₀. By Lemma 3.2 and the procedure of Algorithm (A), we have

By (H1) and the above inequality, we assert that {f_k} is a monotone decreasing number sequence and has a lower bound. Therefore, {f_k} has a limit and thus,

which implies that (23) holds.

4 Linear convergence rate

We further assume that

(H3) The sequence {x_k} generated by Algorithm (A) converges to x*, Ñ² f(x*) is a positive definite matrix and f(x) is twice continuously differentiable on N(x*, ₀) = {x| ||x - x*|| < ₀}.

Lemma 4.1. Assume that (H3) holds. Then there exist m¢, M¢ and with 0 < m¢ < M¢ and < ₀such that

and thus

By (27) and (26) we can also obtain, from Cauchy-Schwartz inequality, that

and

Its proof can be seen from the literature (e.g. [11]).

Lemma 4.2. Assume that (H3) holds and Algorithm (A) generates an infinite sequence {x_k}. Then

Proof. Without loss of generality, suppose that x₀Î N(x*, ). By Lemma 4.1 it follows that (H1) and (H2) hold. By the proof of Theorem 3.1, as long as

we have

Therefore,

which shows that there exists h₀:

such that a_k> h₀. The proof is finished.

Theorem 4.1. If the conditions of Lemma 4.2 hold, then {x_k} converges to x* at least R-linearly.

Proof. By the proof of Theorem 3.1 and Lemma 4.2, and noting Lemmas 2.2 and 4.1, we have

where

By setting

we can prove that q < 1. In fact, since m¢ < L < max(L, M₀) and h₀< 1, by the definition of h, we obtain

By setting

(obviously w < 1), we obtain that

By Lemma 4.1 we have

and thus,

We finally have

which shows that {x_k} converges to x* at least R-linearly.

5 Numerical results

We choose the following numerical examples from [2, 9, 14] to test the new conjugate gradient method.

Problem 1. Penalty function I (problem (23) in [14])

Problem 2. Variable dimensioned function (problem (25) in [14])

Problem 3. Trigonometric function (problem (26) in [14])

Problem 4. A penalty function (problem (18) in [2])

Problem 5. Extended Rosenbrock function (problem (21) in [14])

Problem 6. Penalty function II (modification of problem (24) in [14])

Problem 7. Brown almost linear function (problem (27) in [14])

Problem 8. Linear function-rank 1 (problem (33) in [14], with modified initial values)

In the numerical experiment, we set the parameters µ = 0.013, r = 0.5, L₀ = 0.00001 and M₀ = 10³⁰. We use Matlab 6.1 to program the procedure and stop criterion is

||g_k|| < 10^-8||g₀||.

The numerical results are summarized in Table 1. Strong Wolfe line search is used in the traditional conjugate gradient methods such as FR, PRP, CD, DY, HS and LS.

Thumbnail

Strong Wolfe line serach. a_k is defined by

and

in which

In Table 1, a pair of numbers denote the number of iterations and functional evaluations. The symbol "fail" means that the corresponding conjugate gradient method fails in solving the problem. "CPU" denotes the total CPU time of the corresponding algorithm for solving all the problems. It can be seen from Table 1 that the new nonlinear conjugate gradient method (NM) is effective in practical computation and superior (total CPU time (seconds)) to other similar methods in many situations. Moreover, PRP, HS and LS may fail to converge in solving some problems, while NM always converges in a stable manner when solving the mentioned problems. The new method has the strong convergence property and is more stable than FR, CD and DY conjugate gradient methods.

Numerical results also show that the proposed new method has the best numerical performance in practical computation. Meanwhile, the Lischitz constant estimation of the derivative of objective functions plays an important role in the new method.

6 Conclusion

In this paper, we presented a new nonlinear conjugate gradient method with strong convergence for unconstrained minimization problems. The new method can generate an adequate trust region radius automatically at each iteration and have global convergence and linear convergence rate under some mild conditions. Numerical results showed that the new conjugate gradient method is effective in practical computation and superior to other similar conjugate gradient methods in many situations.

Acknowledgements. The authors are very grateful to the referees and the editor for their valuable comments and suggestions that greatly improved the paper.

Received: 10/V/07. Accepted: 24/IX/07.

#724/07.

[1] J. Barzilai and J.M. Borwein, Two point step size gradient methods IMA J. Numer. Anal., 8 (1988), 141-148.
[2] A.R. Conn, N.I.M. Gould and P.L. Toint, Testing a class of methods for solving minimization problems with simple bounds on the variables Math. Comput., 50 (1998), 399-430.
[3] Y.H. Dai, Conjugate gradient methods with rmijo-type line seaches Acta Mathematicae Applicatae Sinica, English Series, 18 (2002), 123-130.
[4] Y.H. Dai, J.Y. Han, G.H. Liu, D.F. Sun, H.X. Yin and Y. Yuan, Convergence properties of nonlinear conjugate gradient methods SIAM J. Optim., 10 (2000), 345-358.
[5] Y.H. Dai and Y. Yuan, Convergence properties of the conjugate descent method Adv. Math., 25(6) (1996), 552-562.
[6] Y.H. Dai and Y. Yuan, Nonlinear Conjugate Gradient Methods Shanghai Science and Technology Press, Shanghai, 2000.
[7] Y.H. Dai and L.Z. Liao, R-linear convergence of the Barzilai and Borwein gradient method IMA J. Numer. Anal., 22 (2002), 1-10.
[8] Y.H. Dai and Y. Yuan, A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim., 10 (1999), 177-182.
[9] E.D. Dolan and J.J. Moré, Benchmarking optimization software with performance profiles Math. Program., 91(2) (2002), Ser. A, 201-213.
[10] R. Fletcher and C. Reeves, Function minimization by conjugate gradients Computer J., 7 (1964), 149-154.
[11] R. Fletcher, Practical Methods of Optimization, Vol. 1: Unconstrained Optimization [M]. John Wiley & Sons, New York, 1987.
[12] M.R. Hestenes and E.L. Stiefel, Methods of conjugate gradients for solving linear systems J. Res. Nat. Bur. Standards, 49 (1952), 409-436.
[13] Y. Liu and C. Storey, Efficient generalized conjugate gradient algorithms: I. Theory J. Optim. Theory Appl., 69 (1991), 129-137.
[14] J. Moré, B. Garbow and K. Hillstrom, Testing unconstrained optimization software ACM Transactions on Mathematical Software, 7 (1981), 17-41.
[15] E. Polak and G. Ribiére, Note sur la convergence de directions conjugées Rev. Francaise Informat Recherche Opertionelle, 3e Année, 16 (1969), 35-43.
[16] B.T. Polyak, The conjugate gradient method in extremem problems USSR Comp. Math. and Math. Phys., 9 (1969), 94-112.
[17] S. Sanmtias and E. Vercher E, A generalized conjugate gradient algrithm J. Optim Theory Appl., 98 (1998), 489-502.
[18] Z.J. Shi, Nonlinear conjugate gradient method with exact line search (in Chinese) Acta Math. Sci. Ser. A Chin. Ed., 24(6) (2004), 675-682.
[19] Y. Yuan and J. Stoer, A subspace study on conjugate gradient algorithms Z. Angew Math. Mech., 75 (1995), 69-77.
[20] Y. Yuan and W.Y. Sun, Optimization Theory and Methods Science Press, Beijing, 1997.

*

The work was supported in part by NSF CNS-0521142, USA.

Publication Dates

Publication in this collection
02 Apr 2008
Date of issue
2008

History

Accepted
24 Sept 2007
Received
10 May 2007

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

[1] [1] J. Barzilai and J.M. Borwein, Two point step size gradient methods IMA J. Numer. Anal., 8 (1988), 141-148.

[2] [2] A.R. Conn, N.I.M. Gould and P.L. Toint, Testing a class of methods for solving minimization problems with simple bounds on the variables Math. Comput., 50 (1998), 399-430.

[3] [3] Y.H. Dai, Conjugate gradient methods with rmijo-type line seaches Acta Mathematicae Applicatae Sinica, English Series, 18 (2002), 123-130.

[4] [4] Y.H. Dai, J.Y. Han, G.H. Liu, D.F. Sun, H.X. Yin and Y. Yuan, Convergence properties of nonlinear conjugate gradient methods SIAM J. Optim., 10 (2000), 345-358.

[5] [5] Y.H. Dai and Y. Yuan, Convergence properties of the conjugate descent method Adv. Math., 25(6) (1996), 552-562.

[6] [6] Y.H. Dai and Y. Yuan, Nonlinear Conjugate Gradient Methods Shanghai Science and Technology Press, Shanghai, 2000.

[7] [7] Y.H. Dai and L.Z. Liao, R-linear convergence of the Barzilai and Borwein gradient method IMA J. Numer. Anal., 22 (2002), 1-10.

[8] [8] Y.H. Dai and Y. Yuan, A nonlinear conjugate gradient method with a strong global convergence property. SIAM J. Optim., 10 (1999), 177-182.

[9] [9] E.D. Dolan and J.J. Moré, Benchmarking optimization software with performance profiles Math. Program., 91(2) (2002), Ser. A, 201-213.

[10] [10] R. Fletcher and C. Reeves, Function minimization by conjugate gradients Computer J., 7 (1964), 149-154.

[11] [11] R. Fletcher, Practical Methods of Optimization, Vol. 1: Unconstrained Optimization [M]. John Wiley & Sons, New York, 1987.

[12] [12] M.R. Hestenes and E.L. Stiefel, Methods of conjugate gradients for solving linear systems J. Res. Nat. Bur. Standards, 49 (1952), 409-436.

[13] [13] Y. Liu and C. Storey, Efficient generalized conjugate gradient algorithms: I. Theory J. Optim. Theory Appl., 69 (1991), 129-137.

[14] [14] J. Moré, B. Garbow and K. Hillstrom, Testing unconstrained optimization software ACM Transactions on Mathematical Software, 7 (1981), 17-41.

[15] [15] E. Polak and G. Ribiére, Note sur la convergence de directions conjugées Rev. Francaise Informat Recherche Opertionelle, 3e Année, 16 (1969), 35-43.

[16] [16] B.T. Polyak, The conjugate gradient method in extremem problems USSR Comp. Math. and Math. Phys., 9 (1969), 94-112.

[17] [17] S. Sanmtias and E. Vercher E, A generalized conjugate gradient algrithm J. Optim Theory Appl., 98 (1998), 489-502.

[18] [18] Z.J. Shi, Nonlinear conjugate gradient method with exact line search (in Chinese) Acta Math. Sci. Ser. A Chin. Ed., 24(6) (2004), 675-682.

[19] [19] Y. Yuan and J. Stoer, A subspace study on conjugate gradient algorithms Z. Angew Math. Mech., 75 (1995), 69-77.

[20] [20] Y. Yuan and W.Y. Sun, Optimization Theory and Methods Science Press, Beijing, 1997.

Brasil

Brasil

A new algorithm of nonlinear conjugate gradient method with strong convergence

Abstract

Publication Dates

History