Using truncated conjugate gradient method in trust-region method with two subproblems and backtracking line search

Tang, Mingyun; Yuan, Ya-Xiang

doi:10.1590/S1807-03022010000200001

Abstract

A trust-region method with two subproblems and backtracking line search for solving unconstrained optimization is proposed. At every iteration, we use the truncated conjugate gradient method or its variation to solve one of the two subproblems approximately. Backtracking line search is carried out when the trust-region trail step fails. We show that this method have the same convergence properties as the traditional trust-region method based on the truncated conjugate gradient method. Numerical results show that this method is as reliable as the traditional one and more efficient in respect of iterations, CPU time and evaluations. Mathematical subject classification: Primary: 65K05; Secondary: 90C30.

truncated conjugate gradient; trust-region; two subproblems; backtracking; convergence

Using truncated conjugate gradient method in trust-region method with two subproblems and backtracking line search^* * Partially supported by Chinese NSF grant 10831006 and CAS grant kjcx-yw-s7.

Mingyun Tang^I; Ya-Xiang Yuan^II

^ICollege of Science, China Agricultural University, Beijing, 100083, China

^IIState Key Laboratory of Scientific Engineering Computing, Institute of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, P.O. Box 2719, Beijing, 100190, China E-mails: tangmy@lsec.cc.ac.cn / yyx@lsec.cc.ac.cn

ABSTRACT

A trust-region method with two subproblems and backtracking line search for solving unconstrained optimization is proposed. At every iteration, we use the truncated conjugate gradient method or its variation to solve one of the two subproblems approximately. Backtracking line search is carried out when the trust-region trail step fails. We show that this method have the same convergence properties as the traditional trust-region method based on the truncated conjugate gradient method. Numerical results show that this method is as reliable as the traditional one and more efficient in respect of iterations, CPU time and evaluations.

Mathematical subject classification: Primary: 65K05; Secondary: 90C30.

Key words: truncated conjugate gradient, trust-region, two subproblems, backtracking, convergence.

1 Introduction

Consider the unconstrained optimization problem

where is a real-valued twice continuously differentiable function, which we assume bounded below. Unconstrained optimization problems are essential in mathematical programming because they occur frequently in many real-world applications and the methods for such problems are fundamental in the sense that these methods can be either directly applied or extended to optimization problems with constraints. There are many effective algorithms designed for unconstrained optimization problems (see [6, 13]), most of the algorithms can be classified into two categories: line search algorithm and trust region algorithms.

Trust-region method is efficient for solving (1). It has mature framework, strong convergence properties and satisfactory numerical results (see [4]). However, sometimes the trust-region step may be too conservative, especially when the objective function has large "convex basins". The standard trust-region technique may require quit a few iterations to stretch the trust region in order to contain the local minimizer. Thus, it is natural for us to consider modifying the standard trust region method to give an new algorithm which can maintain the convergence properties of the standard trust-region method and need less computational cost.

In the previous research [12], we obtained a trust-region method with two subproblems and backtracking line search. A subproblem without trust-region constraint was introduced into the trust-region framework in order to use the unit stepsize Newton step. This unconstrained subproblem normally gives a longer trail step, consequently it is likely that the overall algorithm will reduce the computational cost. Based on the information in the previous iterations, either the trust-region step or the Newton step is used. Moreover, the idea of combining trust-region and line search techniques (see [8]) is also used in that algorithm. The algorithm inherits the convergence result of traditional trust-region method and gives better performance by making good use of the Newton step and backtracking line search. However, as the exact minimizer of the trust-region subproblem is need and the Cholesky factorization of the Hessian matrixis used to decide whether the model function is convex, that algorithm is obviously not fit for large problems.

Therefore, in this paper we propose a new trust-region algorithm with twosubproblems using truncated conjugate gradient method and its variation to solve the subproblems. Our method can be regarded as a modification to the standard trust region method for unconstrained optimization in such a way that the Newton step can be taken as often as possible. A slightly modified truncated conjugate gradient method is used to compute the Newton step. The global convergence and local superlinear convergence results of the algorithmare also given in the paper.

The paper is organized as follows. In the next section, we give the framework of the method and describe our new algorithm. The convergence properties are presented in Section 3 and the numerical results are provided in Section 4. Some conclusions are given in Section 5.

2 The algorithm

First we briefly review the framework of trust-region method with two subproblems and backtracking line search.

One of the two subproblems is the trust-region subproblem. At the current iteration x_k, the trust-region subproblem is

where

_k(s) is the approximate model function of (x) within the trust-region, g_k = g(x_k) = ∇(x_k) and H_k = ∇²(x_k) or an approximation of ∇²(x_k). We usually choose

_k to be the first three terms of the Taylor expansion of the objective function (x) at x_k with the constant term (x_k) being omitted as this term does not influence the iteration process.

Another subproblem is defined by

where g_k, H_k has the same meaning as in (2). Since in this subproblem we do not require the trust region constraint, we call it unconstrained subproblem.

In the ideal situation, the unconstrained subproblem should be used when the model function is convex and gives an accurate approximation to the objective function. Define

The ratio ρ_k is used by trust region algorithms to decide whether the trial step is acceptable and how to update the trust-region radius. In the method given in [12], we also use the value of ρ_k and the positive definiteness of ∇²(x_k) to decide the model choice since we solve the trust-region subproblem exactly. In this paper, we use the truncated conjugate gradient method (see [1, 10]) to compute a minimizer of the trust-region subproblem approximately, as Cholesky factorization cannot be used for large scale problems. Now, we consider how to compute the unconstrained model minimizer approximately.

Consider using the conjugate gradient method to solve the subproblem (3). The subscript i denotes the interior iteration number. If we do not know whether our quadratic model is strictly convex, precautions must be taken to deal with non-convexity if it arises. Similarly to the analysis of the truncated conjugate gradient method (see [4]), if we minimize _k without considering whether or not ∇²(x_k) is positive definite, the following two possibilities might arise:

(i) the curvature 〈p_i, Hp_i〉 is positive at each iteration. This means the current model function is convex along direction p_i, as we expect. In this case, we just need to continue the iteration of the conjugate gradient method.

(ii) 〈p_i, Hp_i〉 < 0. This means the model function is not strictly convex. _k is unbounded from below along the line s_i + σp_i. The unconstrained subproblem is obviously not fit for reflecting the condition of the objective function around the current iteration now. So we should add the trust-region constraint and minimize _k along s_i + σp_ias much as we can while staying within the trust region. In this case, what we need to do is finding the positive root of ||s_i + σp_i|| = Δ.

In order to avoid conjugate gradient iterations that make very little progress in the reduction of the model quadratical function, the iteration are also terminated if one of the following conditions

is satisfied (see [9]). The iterations are also terminated if the theoretical upper bound n is reached.

Now, we can give an algorithm for solving the unconstrained subproblem approximately. It is a variation of the truncated conjugate gradient method.

Algorithm 2.1

Step 1 Initialization Set s₀ = 0, v₀ = g₀ = ∇_x (x_k), p₀ = -g₀, curq = 0, preq = 1, kappag = 0.01, ε_k = min(kappag, ), itermax = n.

Step 2 While i < itermax

if preq - curq < kappag*(-curq), stop.

κ_i = p_i^T H p_i,

if κ_i< 0, then

info: = 1, if ||s_i|| > Δ_k then stop, else compute σ as the

positive root of || s_i+ σp_i|| = Δ_k, s_i₊₁ = s_i+ σp_i, stop.

end if

α_i = 〈g_i, v_i〉/κ_i,

s_i₊₁ = s_i + α_i p_i,

update preq, curq, g_i₊₁,

if ||g_i₊₁||/||g₀|| < ε_k, stop.

v_i₊₁ = g_i₊₁,

b_i = 〈g_i₊₁, v_i₊₁〉 〈g_i, v_i〉,

p_i₊₁ = -v_i₊₁ + β_i p_i,

i = i + 1,

goto step 2.

The above modification of the conjugate gradient method can deal with negative curvature directions. Such technique is also discussed in [1] as well. In the above algorithm, if info equals 1, the current model function is not convex. It is easy to see that the computation cost in each iteration of the above algorithm is mainly one matrix-vector multiplication. Thus, it is very likely that the above algorithm is faster than solving H_ks = - g_k by carrying out the Cholesky factorization of H_k directly.

We now describe the algorithm of using truncated conjugate gradient method in the trust-region method with two subproblems and backtracking line search. We use ρ_k and flag info to decide the model choice. If the value of ρ_k of an unconstrained subproblem is smaller than a positive constant η₂ or info = 1, we may consider that the unconstrained model is not proper and choose the trust-region subproblem in the next iteration. We take the unconstrained model if the ratio ρ_k of the trust-region trail step is bigger than a constant β (β → 1 and β < 1) in 2 consecutive steps. The overall algorithm is given as follows.

Algorithm 2.2 (A trust region method with two subproblems)

Step 1 Initialization.

An initial point x₀ and an initial trust-region radius Δ₀ > 0 are given. The stopping tolerance ε is given. The constants η₁, η₂, γ₁, γ₂ and β are also given and satisfy 0 < η₁< η₂ < β < 1 and 0 < γ₁ < 1 < γ₂. Set k := 0, btime := 0 and f_min:= f(x₀). Set flag TR₀:= 0, info := 0.

Step 2 Determine a trial step.

if TR_k= 0, then compute s_k by Algorithm 2.1;

else use truncated CG method to obtain s_k.

x_t:= x_k+ s_k.

Step 3 Backtracking line search.

f_t:= f(x_t).

if f_t< f_min go to step 4;

else if TR_k= 1, then carry out backtracking line search;

else then

TR_k₊₁ := 1, btime := 0, k := k + 1, go to step 2.

Step 4 Acceptance of the trial point and update the flag TR_k ₊₁ and the trust-region radius.

x_k₊₁ := x_t. f_min:= f(x_t).

Compute ρ_k according to (4).

Update Δ_k₊₁:

Update btime and TR_k₊₁:

if btime = 2, btime :=0.

k := k + 1

go to step 2.

In the above algorithm, backtracking line search is carried out using the same formula as in [12]. We try to find the minimum positive integer i such that (x_k + αⁱs) < (x_k), where α ∈ (0, 1) is a positive constant (see [8]). The step size α is computed by polynomial interpolation since (x_k), g(x_k), ∇²(x_k) and (x_k + s) are all known. Denote q = s^T∇²(x_k)s, then

(choose α = -g_k^Ts/s^T∇²(x_k)s or the result of truncated quadratic interpolation when the denominator equals to zero). Set α_k = max[0.1, α], in case that α_k is too small. It is obvious that to evaluate the objective function on two close points is a waste of computational cost and available information. The idea of discarding small steps computed as minimizers of interpolation functions was also explored in [2] and [3].

3 Convergence

In this section we present the convergence properties of the algorithm given in the previous section.

In our algorithm, if the unconstrained subproblem is chosen and the Hessian matrix is not positive definite, the trial step will be truncated on the boundary of trust-region just the same as truncated conjugate gradient method. So the difference arises when the model function is convex and the trail step is large, which provides more decrease of the model function. Thus in fact the unconstrained model is used only when the model function is convex. The proof of our following theorem is similar to that of Theorem 3.2 of Steihaug [11] andPowell [10].

Theorem 3.1.Suppose that in (1) is twice continuously differentiable and bounded below and the norm of Hessian matrix is bounded. Let ε_k be the relative error in the truncated conjugate gradient method and the Algorithm 2.1. Iteration {x_k} is generated by the Algorithm 2.2. If ε_k< ε < 1, then

Proof. Since we have

no matter the trail step s_k is computed by the trust-region subproblem or the unconstrained subproblem, it follows from Powell's result [10] that

with c₁ = .

We prove the theorem by contradiction. If the theorem were not true, we can assume that

Thus, due to (4) and the boundedness of ||H_k||, there exists a positive constant such that

First we show that

Define the following two sets of indices:

Since is bounded below, we have

which shows that

Hence there exists k₀ such that

because ||H_ks_k + g_k|| > δ for all sufficiently large k. This shows that

First we consider the case that U is a finite set. In this case, there exists an integer k₁ such that TR_k = 1 for all k > k₁ and Algorithm 2.2 is essentially the standard trust-region algorithm for all large k. Thus

Let k₂ = max{k₀, k₁}, we have that

which shows that (6) is true.

Now we consider the case that U has infinitely many elements.

If k ∉ S, TR_k = 0 and k is sufficiently large, we have that TR_k₊₁ = 1 and

while k ∉ S, TR_k = 1, always have Δ_k+1 = γ₁Δ_k. Therefore there exists k₃ such that

Hence

Relation (10), (11) and (17) indicate that

which implies that lim_k_{→ ∞} Δ_k = 0. Therefore, when k → +∞ and ||s_k|| < Δ_k, we have

Thus, Δ_k+1> Δ_k if k is sufficiently large and if ||s_k|| < Δ_k. If ||s_k|| > Δ_k, we know that TR_k = 0, our algorithm gives either Δ_k+1 = Δ_k or Δ_k+1 = γ₂Δ_k. This shows that Δ_k+1> Δ_k for all large k. This contradicts to (18). So (4) must therefore be false, which yields (1).

The above theorem shows that our algorithm is globally convergent. Furthermore, we can show that our algorithm converges superlinearly if certain conditions are satisfied.

Theorem 3.2.Suppose that in (1) in Section 1 is twice continuously differentiable and bounded below and the norm of Hessian matrix is bounded, the iteration {x_k} generated by Algorithm 2.2 satisfies x_k → x^* as k → ∞ and the Hessian matrix H(x^*) of f is positive definite. Let ε_k be the relative error in the truncated conjugate gradient method and Algorithm 2.1. If ε_k→ 0 then {x_k} converges superlinearly, i.e.,

Proof. Because x_k → x^* and H(x^*) > 0, there exists k₁ such that||H^-1(x_k)|| < 2||H^-1(x^*)|| for all k > k₁. Therefore the Newton step s_k^N = -H^-1(x_k)g(x_k) satisfies that

for all k > k₁. Therefore, no matter s_k generated by our algorithm is a trust-region step or a truncated Newton step, we have that

Our previous theorem implies that ||g(x_k)|| → 0. Inequality (22) shows that

Consequently, x_k₊₁ = x_k + s_k and Δ_k+1> Δ_k for all sufficiently large k. Consequently, ||s_k|| < Δ_k for all sufficiently large k. Namely s_k is an inexact Newton step for all large k, which indicates that

for all sufficiently large k. Relation (24) shows that

Now, (20) follows from the fact that H(x^*) > 0 and x_k → x^*.

4 Numerical results

In this section we report numerical results of our algorithm given in Section 2, and we also compare our algorithm with traditional trust region algorithm as given in [6, 13]. Test problems are the 153 unconstrained problems from the CUTEr collection (see [7]). The names and dimensions of the problems are given in Tables 1-3.

Thumbnail

The starting point and the exact first and second derivatives supplied with the problem were used. Numerical tests were performed in double precision on a Dell OptiPlex 755 computer (2.66 GHz, 1.96 GB of RAM) under Linux (fedora core 8) and the gcc compiler (version 4.2.3) with default options. All attempts to solve the problems are limited to a maximum of 1000 iterations or 1 hour of CPU time. The choice of the parameters do not have a uniform standard and the parameters are not sensitive to the algorithm. So we choose the common values as (for example, see [6, 13]) γ₁ = 0.25, γ₂ = 2, η₁ = 0.1, η₂ = 0.75, β = 0.9, Δ₀ = 1. The truncated conjugate gradient method (see [11]) is used to solve the trust-region subproblem. Both algorithms stop if || ∇ (x_k)|| < 10^-6.

Our algorithm solved 125 problems out of the 153 CUTEr test problems, while the traditional trust-region method solved 120 problems. Failure often occurs because the maximal iteration number is reached. Thus, we found that the new algorithm is as reliable as the traditional one.

Both algorithms fail on the same set of 27 problems. For the other 126 problems, the new algorithm needs less iterations on 88 problems. The two algorithms have the same number of iterations on 22 problems and the traditional trust-region method wins on 16 problems. Figure 1 gives the performance profiles (see [5]) for the two algorithms for iterations. Figure 2 gives the performance profiles for CPU times. Considering account inaccuracies in timing, we only compare the CPU times of the 49 test problems whose run-times are longer than 0.1 second and dimensions are larger than 100. The new method takes less time to solve 33 among these 49 problems. Figure 3, 4 and 5 give the performance profiles for function, gradient and Hessian evaluations. Advantage of the new algorithm is also shown by total number of evaluations since it is dominative on 77 problems.

It is easy to see from these figures that the new algorithm is more efficient than the traditional trust-region algorithm.

5 Conclusions

We have proposed a new trust-region algorithm with two subproblems and backtracking line search using truncated conjugate gradient method and its variation to solve the subproblems. This new algorithm for unconstrained optimization is global convergence and has local superlinear convergence rate when the Hessian matrix of the objective function at the local minimizer is positive definite. Numerical results on problems from CUTEr collection are also given. The results show that the new algorithm is more efficient than the standard trust-region method in term of the number of iterations and evaluations as well as CPU time.

Acknowledgements. The authors would like to thank an anonymous referee for his valuable comments on an earlier version of the paper.

Received: 28/IV/09.

Accepted: 30/VI/09.

#CAM-99/09.

[1] E.G. Birgin and J.M. Martínez, Large-scale active-set box-constrained optimization method with spectral projected gradients. Computational Optimization and Applications, 23 (2002), 101-125.
[2] E.G. Birgin, J.M. Martínez and M. Raydan, Nonmonotone spectral projected gradient methods on convex sets. SIAM Journal on Optimization, 10 (2000), 1196-1211.
[3] E.G. Birgin, J.M. Martínez and M. Raydan, Algorithm 813: SPG - software for convex-constrained optimization. ACM Transactions on Mathematical Software, 27 (2001), 340-349.
[4] A.R. Conn, N.I.M. Gould and Ph.L. Toint, Trust-Region Methods. SIAM, Philadelphia, USA, (2000).
[5] E.D. Dolan and J.J. Moré, Benchmarking optimization software with performance profiles. Mathematical Programming, 91(2) (2002), 201-213.
[6] R. Fletcher, Practical Methods of Optimization, Vol. 1, Unconstrained Optimization. John Wiley and Sons, Chichester, (1980).
[7] N.I.M. Gould, D. Orban and Ph.L. Toint, CUTEr (and SifDec), a constrained and unconstrained testing environment, revisited. Transactions of the ACM on Mathematical Software, 29(4) (2003), 373-394.
[8] J. Nocedal and Y. Yuan, Combining trust region and line search techniques. Advances in Nonlinear Programming, (1998), 153-175.
[9] M.J.D. Powell, The NEWUOA software for unconstrained optimization without derivatives. Large-Scale Nonlinear Optimization, 83 (2006), 256-297, Springer, USA.
[10] M.J.D. Powell, Convergence properties of a class of minimization algorithms. Nonlinear Programming 2, Eds. O.L. Mangasarian, R.R. Meyer and S.M. Robinson, (1975), Academic Press, New York.
[11] T. Steihaug, The Conjugate Gradient Method and Trust Regions in Large Scale Optimization. SIAM J. Numer. Anal., 20(3) (1983), 626-637.
[12] M.Y. Tang, A trust-region-Newton method for unconstrained optimization. Proceedings of the 9th National Conference of ORSC, Eds. Y. Yuan, X.D. Hu and D.G. Liu, (2008), Global-Link, Hong Kong.
[13] Y. Yuan, Computational Methods for Nonlinear Optimization. Science Press, Beijing, China, (2008). (in Chinese).

*

Partially supported by Chinese NSF grant 10831006 and CAS grant kjcx-yw-s7.

Publication Dates

Publication in this collection
22 July 2010
Date of issue
June 2010

History

Received
28 Apr 2009
Accepted
30 June 2009

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

[1] [1] E.G. Birgin and J.M. Martínez, Large-scale active-set box-constrained optimization method with spectral projected gradients. Computational Optimization and Applications, 23 (2002), 101-125.

[2] [2] E.G. Birgin, J.M. Martínez and M. Raydan, Nonmonotone spectral projected gradient methods on convex sets. SIAM Journal on Optimization, 10 (2000), 1196-1211.

[3] [3] E.G. Birgin, J.M. Martínez and M. Raydan, Algorithm 813: SPG - software for convex-constrained optimization. ACM Transactions on Mathematical Software, 27 (2001), 340-349.

[4] [4] A.R. Conn, N.I.M. Gould and Ph.L. Toint, Trust-Region Methods. SIAM, Philadelphia, USA, (2000).

[5] [5] E.D. Dolan and J.J. Moré, Benchmarking optimization software with performance profiles. Mathematical Programming, 91(2) (2002), 201-213.

[6] [6] R. Fletcher, Practical Methods of Optimization, Vol. 1, Unconstrained Optimization. John Wiley and Sons, Chichester, (1980).

[7] [7] N.I.M. Gould, D. Orban and Ph.L. Toint, CUTEr (and SifDec), a constrained and unconstrained testing environment, revisited. Transactions of the ACM on Mathematical Software, 29(4) (2003), 373-394.

[8] [8] J. Nocedal and Y. Yuan, Combining trust region and line search techniques. Advances in Nonlinear Programming, (1998), 153-175.

[9] [9] M.J.D. Powell, The NEWUOA software for unconstrained optimization without derivatives. Large-Scale Nonlinear Optimization, 83 (2006), 256-297, Springer, USA.

[10] [10] M.J.D. Powell, Convergence properties of a class of minimization algorithms. Nonlinear Programming 2, Eds. O.L. Mangasarian, R.R. Meyer and S.M. Robinson, (1975), Academic Press, New York.

[11] [11] T. Steihaug, The Conjugate Gradient Method and Trust Regions in Large Scale Optimization. SIAM J. Numer. Anal., 20(3) (1983), 626-637.

[12] [12] M.Y. Tang, A trust-region-Newton method for unconstrained optimization. Proceedings of the 9th National Conference of ORSC, Eds. Y. Yuan, X.D. Hu and D.G. Liu, (2008), Global-Link, Hong Kong.

[13] [13] Y. Yuan, Computational Methods for Nonlinear Optimization. Science Press, Beijing, China, (2008). (in Chinese).

Brasil

Brasil

Using truncated conjugate gradient method in trust-region method with two subproblems and backtracking line search

Abstract

Publication Dates

History