Acessibilidade / Reportar erro

Using truncated conjugate gradient method in trust-region method with two subproblems and backtracking line search

Abstract

A trust-region method with two subproblems and backtracking line search for solving unconstrained optimization is proposed. At every iteration, we use the truncated conjugate gradient method or its variation to solve one of the two subproblems approximately. Backtracking line search is carried out when the trust-region trail step fails. We show that this method have the same convergence properties as the traditional trust-region method based on the truncated conjugate gradient method. Numerical results show that this method is as reliable as the traditional one and more efficient in respect of iterations, CPU time and evaluations. Mathematical subject classification: Primary: 65K05; Secondary: 90C30.

truncated conjugate gradient; trust-region; two subproblems; backtracking; convergence


Using truncated conjugate gradient method in trust-region method with two subproblems and backtracking line search* * Partially supported by Chinese NSF grant 10831006 and CAS grant kjcx-yw-s7.

Mingyun TangI; Ya-Xiang YuanII

ICollege of Science, China Agricultural University, Beijing, 100083, China

IIState Key Laboratory of Scientific Engineering Computing, Institute of Computational Mathematics and Scientific/Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, P.O. Box 2719, Beijing, 100190, China E-mails: tangmy@lsec.cc.ac.cn / yyx@lsec.cc.ac.cn

ABSTRACT

A trust-region method with two subproblems and backtracking line search for solving unconstrained optimization is proposed. At every iteration, we use the truncated conjugate gradient method or its variation to solve one of the two subproblems approximately. Backtracking line search is carried out when the trust-region trail step fails. We show that this method have the same convergence properties as the traditional trust-region method based on the truncated conjugate gradient method. Numerical results show that this method is as reliable as the traditional one and more efficient in respect of iterations, CPU time and evaluations.

Mathematical subject classification: Primary: 65K05; Secondary: 90C30.

Key words: truncated conjugate gradient, trust-region, two subproblems, backtracking, convergence.

1 Introduction

Consider the unconstrained optimization problem

where ƒ is a real-valued twice continuously differentiable function, which we assume bounded below. Unconstrained optimization problems are essential in mathematical programming because they occur frequently in many real-world applications and the methods for such problems are fundamental in the sense that these methods can be either directly applied or extended to optimization problems with constraints. There are many effective algorithms designed for unconstrained optimization problems (see [6, 13]), most of the algorithms can be classified into two categories: line search algorithm and trust region algorithms.

Trust-region method is efficient for solving (1). It has mature framework, strong convergence properties and satisfactory numerical results (see [4]). However, sometimes the trust-region step may be too conservative, especially when the objective function has large "convex basins". The standard trust-region technique may require quit a few iterations to stretch the trust region in order to contain the local minimizer. Thus, it is natural for us to consider modifying the standard trust region method to give an new algorithm which can maintain the convergence properties of the standard trust-region method and need less computational cost.

In the previous research [12], we obtained a trust-region method with two subproblems and backtracking line search. A subproblem without trust-region constraint was introduced into the trust-region framework in order to use the unit stepsize Newton step. This unconstrained subproblem normally gives a longer trail step, consequently it is likely that the overall algorithm will reduce the computational cost. Based on the information in the previous iterations, either the trust-region step or the Newton step is used. Moreover, the idea of combining trust-region and line search techniques (see [8]) is also used in that algorithm. The algorithm inherits the convergence result of traditional trust-region method and gives better performance by making good use of the Newton step and backtracking line search. However, as the exact minimizer of the trust-region subproblem is need and the Cholesky factorization of the Hessian matrixis used to decide whether the model function is convex, that algorithm is obviously not fit for large problems.

Therefore, in this paper we propose a new trust-region algorithm with twosubproblems using truncated conjugate gradient method and its variation to solve the subproblems. Our method can be regarded as a modification to the standard trust region method for unconstrained optimization in such a way that the Newton step can be taken as often as possible. A slightly modified truncated conjugate gradient method is used to compute the Newton step. The global convergence and local superlinear convergence results of the algorithmare also given in the paper.

The paper is organized as follows. In the next section, we give the framework of the method and describe our new algorithm. The convergence properties are presented in Section 3 and the numerical results are provided in Section 4. Some conclusions are given in Section 5.

2 The algorithm

First we briefly review the framework of trust-region method with two subproblems and backtracking line search.

One of the two subproblems is the trust-region subproblem. At the current iteration xk, the trust-region subproblem is

where

k(s) is the approximate model function of ƒ(x) within the trust-region, gk = g(xk) = ∇ƒ(xk) and Hk = ∇2ƒ(xk) or an approximation of ∇2ƒ(xk). We usually choose k to be the first three terms of the Taylor expansion of the objective function ƒ(x) at xk with the constant term ƒ(xk) being omitted as this term does not influence the iteration process.

Another subproblem is defined by

where gk, Hk has the same meaning as in (2). Since in this subproblem we do not require the trust region constraint, we call it unconstrained subproblem.

In the ideal situation, the unconstrained subproblem should be used when the model function is convex and gives an accurate approximation to the objective function. Define

The ratio ρk is used by trust region algorithms to decide whether the trial step is acceptable and how to update the trust-region radius. In the method given in [12], we also use the value of ρk and the positive definiteness of ∇2ƒ(xk) to decide the model choice since we solve the trust-region subproblem exactly. In this paper, we use the truncated conjugate gradient method (see [1, 10]) to compute a minimizer of the trust-region subproblem approximately, as Cholesky factorization cannot be used for large scale problems. Now, we consider how to compute the unconstrained model minimizer approximately.

Consider using the conjugate gradient method to solve the subproblem (3). The subscript i denotes the interior iteration number. If we do not know whether our quadratic model is strictly convex, precautions must be taken to deal with non-convexity if it arises. Similarly to the analysis of the truncated conjugate gradient method (see [4]), if we minimize k without considering whether or not ∇2ƒ(xk) is positive definite, the following two possibilities might arise:

(i) the curvature 〈pi, Hpi〉 is positive at each iteration. This means the current model function is convex along direction pi, as we expect. In this case, we just need to continue the iteration of the conjugate gradient method.

(ii) 〈pi, Hpi〉 < 0. This means the model function is not strictly convex. k is unbounded from below along the line si + σpi. The unconstrained subproblem is obviously not fit for reflecting the condition of the objective function around the current iteration now. So we should add the trust-region constraint and minimize k along si + σpias much as we can while staying within the trust region. In this case, what we need to do is finding the positive root of ||si + σpi|| = Δ.

In order to avoid conjugate gradient iterations that make very little progress in the reduction of the model quadratical function, the iteration are also terminated if one of the following conditions

is satisfied (see [9]). The iterations are also terminated if the theoretical upper bound n is reached.

Now, we can give an algorithm for solving the unconstrained subproblem approximately. It is a variation of the truncated conjugate gradient method.

Algorithm 2.1

Step 1 Initialization Set s0 = 0, v0 = g0 = ∇x ƒ(xk), p0 = -g0, curq = 0, preq = 1, kappag = 0.01, εk = min(kappag, ), itermax = n.

Step 2 While i < itermax

if preq - curq < kappag*(-curq), stop.

κi = piT H pi,

if κi< 0, then

info: = 1, if ||si|| > Δk then stop, else compute σ as the

positive root of || si+ σpi|| = Δk, si+1 = si + σpi, stop.

end if

αi = 〈gi, vi〉/κi,

si+1 = si + αi pi,

update preq, curq, gi+1,

if ||gi+1||/||g0|| < εk, stop.

vi+1 = gi+1,

bi = gi+1, vi+1〉 〈gi, vi,

pi+1 = -vi+1 + βi pi,

i = i + 1,

goto step 2.

The above modification of the conjugate gradient method can deal with negative curvature directions. Such technique is also discussed in [1] as well. In the above algorithm, if info equals 1, the current model function is not convex. It is easy to see that the computation cost in each iteration of the above algorithm is mainly one matrix-vector multiplication. Thus, it is very likely that the above algorithm is faster than solving Hks = - gk by carrying out the Cholesky factorization of Hk directly.

We now describe the algorithm of using truncated conjugate gradient method in the trust-region method with two subproblems and backtracking line search. We use ρk and flag info to decide the model choice. If the value of ρk of an unconstrained subproblem is smaller than a positive constant η2 or info = 1, we may consider that the unconstrained model is not proper and choose the trust-region subproblem in the next iteration. We take the unconstrained model if the ratio ρk of the trust-region trail step is bigger than a constant β (β → 1 and β < 1) in 2 consecutive steps. The overall algorithm is given as follows.

Algorithm 2.2 (A trust region method with two subproblems)

Step 1 Initialization.

An initial point x0 and an initial trust-region radius Δ0 > 0 are given. The stopping tolerance ε is given. The constants η1, η2, γ1, γ2 and β are also given and satisfy 0 < η1< η2 < β < 1 and 0 < γ1 < 1 < γ2. Set k := 0, btime := 0 and fmin := f(x0). Set flag TR0 := 0, info := 0.

Step 2 Determine a trial step.

if TRk= 0, then compute sk by Algorithm 2.1;

else use truncated CG method to obtain sk.

xt:= xk + sk.

Step 3 Backtracking line search.

ft := f(xt).

if ft< fmin go to step 4;

else if TRk= 1, then carry out backtracking line search;

else then

TRk+1 := 1, btime := 0, k := k + 1, go to step 2.

Step 4 Acceptance of the trial point and update the flag TRk +1 and the trust-region radius.

xk+1 := xt. fmin:= f(xt).

Compute ρk according to (4).

Update Δk+1:

Update btime and TRk+1:

if btime = 2, btime :=0.

k := k + 1

go to step 2.

In the above algorithm, backtracking line search is carried out using the same formula as in [12]. We try to find the minimum positive integer i such that ƒ(xk + αis) < ƒ(xk), where α ∈ (0, 1) is a positive constant (see [8]). The step size α is computed by polynomial interpolation since ƒ(xk), g(xk), ∇2ƒ(xk) and ƒ(xk + s) are all known. Denote q = sT2ƒ(xk)s, then

(choose α = -gkTs/sT2ƒ(xk)s or the result of truncated quadratic interpolation when the denominator equals to zero). Set αk = max[0.1, α], in case that αk is too small. It is obvious that to evaluate the objective function on two close points is a waste of computational cost and available information. The idea of discarding small steps computed as minimizers of interpolation functions was also explored in [2] and [3].

3 Convergence

In this section we present the convergence properties of the algorithm given in the previous section.

In our algorithm, if the unconstrained subproblem is chosen and the Hessian matrix is not positive definite, the trial step will be truncated on the boundary of trust-region just the same as truncated conjugate gradient method. So the difference arises when the model function is convex and the trail step is large, which provides more decrease of the model function. Thus in fact the unconstrained model is used only when the model function is convex. The proof of our following theorem is similar to that of Theorem 3.2 of Steihaug [11] andPowell [10].

Theorem 3.1.Suppose that ƒ in (1) is twice continuously differentiable and bounded below and the norm of Hessian matrix is bounded. Let εk be the relative error in the truncated conjugate gradient method and the Algorithm 2.1. Iteration {xk} is generated by the Algorithm 2.2. If εk< ε < 1, then

Proof. Since we have

no matter the trail step sk is computed by the trust-region subproblem or the unconstrained subproblem, it follows from Powell's result [10] that

with c1 = .

We prove the theorem by contradiction. If the theorem were not true, we can assume that

Thus, due to (4) and the boundedness of ||Hk||, there exists a positive constant such that

First we show that

Define the following two sets of indices:

Since ƒ is bounded below, we have

which shows that

Hence there exists k0 such that

because ||Hksk + gk|| > δ for all sufficiently large k. This shows that

First we consider the case that U is a finite set. In this case, there exists an integer k1 such that TRk = 1 for all k > k1 and Algorithm 2.2 is essentially the standard trust-region algorithm for all large k. Thus

Let k2 = max{k0, k1}, we have that

which shows that (6) is true.

Now we consider the case that U has infinitely many elements.

If kS, TRk = 0 and k is sufficiently large, we have that TRk+1 = 1 and

while kS, TRk = 1, always have Δk+1 = γ1Δk. Therefore there exists k3 such that

Hence

Relation (10), (11) and (17) indicate that

which implies that limk→ ∞ Δk = 0. Therefore, when k → +∞ and ||sk|| < Δk, we have

Thus, Δk+1> Δk if k is sufficiently large and if ||sk|| < Δk. If ||sk|| > Δk, we know that TRk = 0, our algorithm gives either Δk+1 = Δk or Δk+1 = γ2Δk. This shows that Δk+1> Δk for all large k. This contradicts to (18). So (4) must therefore be false, which yields (1).

The above theorem shows that our algorithm is globally convergent. Furthermore, we can show that our algorithm converges superlinearly if certain conditions are satisfied.

Theorem 3.2.Suppose that ƒ in (1) in Section 1 is twice continuously differentiable and bounded below and the norm of Hessian matrix is bounded, the iteration {xk} generated by Algorithm 2.2 satisfies xkx* as k → ∞ and the Hessian matrix H(x*) of f is positive definite. Let εk be the relative error in the truncated conjugate gradient method and Algorithm 2.1. If εk→ 0 then {xk} converges superlinearly, i.e.,

Proof. Because xkx* and H(x*) > 0, there exists k1 such that||H-1(xk)|| < 2||H-1(x*)|| for all k > k1. Therefore the Newton step skN = -H-1(xk)g(xk) satisfies that

for all k > k1. Therefore, no matter sk generated by our algorithm is a trust-region step or a truncated Newton step, we have that

Our previous theorem implies that ||g(xk)|| → 0. Inequality (22) shows that

Consequently, xk+1 = xk + sk and Δk+1> Δk for all sufficiently large k. Consequently, ||sk|| < Δk for all sufficiently large k. Namely sk is an inexact Newton step for all large k, which indicates that

for all sufficiently large k. Relation (24) shows that

Now, (20) follows from the fact that H(x*) > 0 and xkx*.

4 Numerical results

In this section we report numerical results of our algorithm given in Section 2, and we also compare our algorithm with traditional trust region algorithm as given in [6, 13]. Test problems are the 153 unconstrained problems from the CUTEr collection (see [7]). The names and dimensions of the problems are given in Tables 1-3.

The starting point and the exact first and second derivatives supplied with the problem were used. Numerical tests were performed in double precision on a Dell OptiPlex 755 computer (2.66 GHz, 1.96 GB of RAM) under Linux (fedora core 8) and the gcc compiler (version 4.2.3) with default options. All attempts to solve the problems are limited to a maximum of 1000 iterations or 1 hour of CPU time. The choice of the parameters do not have a uniform standard and the parameters are not sensitive to the algorithm. So we choose the common values as (for example, see [6, 13]) γ1 = 0.25, γ2 = 2, η1 = 0.1, η2 = 0.75, β = 0.9, Δ0 = 1. The truncated conjugate gradient method (see [11]) is used to solve the trust-region subproblem. Both algorithms stop if || ∇ ƒ(xk)|| < 10-6.

Our algorithm solved 125 problems out of the 153 CUTEr test problems, while the traditional trust-region method solved 120 problems. Failure often occurs because the maximal iteration number is reached. Thus, we found that the new algorithm is as reliable as the traditional one.

Both algorithms fail on the same set of 27 problems. For the other 126 problems, the new algorithm needs less iterations on 88 problems. The two algorithms have the same number of iterations on 22 problems and the traditional trust-region method wins on 16 problems. Figure 1 gives the performance profiles (see [5]) for the two algorithms for iterations. Figure 2 gives the performance profiles for CPU times. Considering account inaccuracies in timing, we only compare the CPU times of the 49 test problems whose run-times are longer than 0.1 second and dimensions are larger than 100. The new method takes less time to solve 33 among these 49 problems. Figure 3, 4 and 5 give the performance profiles for function, gradient and Hessian evaluations. Advantage of the new algorithm is also shown by total number of evaluations since it is dominative on 77 problems.






It is easy to see from these figures that the new algorithm is more efficient than the traditional trust-region algorithm.

5 Conclusions

We have proposed a new trust-region algorithm with two subproblems and backtracking line search using truncated conjugate gradient method and its variation to solve the subproblems. This new algorithm for unconstrained optimization is global convergence and has local superlinear convergence rate when the Hessian matrix of the objective function at the local minimizer is positive definite. Numerical results on problems from CUTEr collection are also given. The results show that the new algorithm is more efficient than the standard trust-region method in term of the number of iterations and evaluations as well as CPU time.

Acknowledgements. The authors would like to thank an anonymous referee for his valuable comments on an earlier version of the paper.

Received: 28/IV/09.

Accepted: 30/VI/09.

#CAM-99/09.

  • [1] E.G. Birgin and J.M. Martínez, Large-scale active-set box-constrained optimization method with spectral projected gradients. Computational Optimization and Applications, 23 (2002), 101-125.
  • [2] E.G. Birgin, J.M. Martínez and M. Raydan, Nonmonotone spectral projected gradient methods on convex sets. SIAM Journal on Optimization, 10 (2000), 1196-1211.
  • [3] E.G. Birgin, J.M. Martínez and M. Raydan, Algorithm 813: SPG - software for convex-constrained optimization. ACM Transactions on Mathematical Software, 27 (2001), 340-349.
  • [4] A.R. Conn, N.I.M. Gould and Ph.L. Toint, Trust-Region Methods. SIAM, Philadelphia, USA, (2000).
  • [5] E.D. Dolan and J.J. Moré, Benchmarking optimization software with performance profiles. Mathematical Programming, 91(2) (2002), 201-213.
  • [6] R. Fletcher, Practical Methods of Optimization, Vol. 1, Unconstrained Optimization. John Wiley and Sons, Chichester, (1980).
  • [7] N.I.M. Gould, D. Orban and Ph.L. Toint, CUTEr (and SifDec), a constrained and unconstrained testing environment, revisited. Transactions of the ACM on Mathematical Software, 29(4) (2003), 373-394.
  • [8] J. Nocedal and Y. Yuan, Combining trust region and line search techniques. Advances in Nonlinear Programming, (1998), 153-175.
  • [9] M.J.D. Powell, The NEWUOA software for unconstrained optimization without derivatives. Large-Scale Nonlinear Optimization, 83 (2006), 256-297, Springer, USA.
  • [10] M.J.D. Powell, Convergence properties of a class of minimization algorithms. Nonlinear Programming 2, Eds. O.L. Mangasarian, R.R. Meyer and S.M. Robinson, (1975), Academic Press, New York.
  • [11] T. Steihaug, The Conjugate Gradient Method and Trust Regions in Large Scale Optimization. SIAM J. Numer. Anal., 20(3) (1983), 626-637.
  • [12] M.Y. Tang, A trust-region-Newton method for unconstrained optimization. Proceedings of the 9th National Conference of ORSC, Eds. Y. Yuan, X.D. Hu and D.G. Liu, (2008), Global-Link, Hong Kong.
  • [13] Y. Yuan, Computational Methods for Nonlinear Optimization. Science Press, Beijing, China, (2008). (in Chinese).
  • *
    Partially supported by Chinese NSF grant 10831006 and CAS grant kjcx-yw-s7.
  • Publication Dates

    • Publication in this collection
      22 July 2010
    • Date of issue
      June 2010

    History

    • Received
      28 Apr 2009
    • Accepted
      30 June 2009
    Sociedade Brasileira de Matemática Aplicada e Computacional Sociedade Brasileira de Matemática Aplicada e Computacional - SBMAC, Rua Maestro João Seppe, nº. 900 , 16º. andar - Sala 163, 13561-120 São Carlos - SP Brasil, Tel./Fax: 55 16 3412-9752 - São Carlos - SP - Brazil
    E-mail: sbmac@sbmac.org.br