Acessibilidade / Reportar erro

The effect of the nonlinearity on GCV applied to conjugate gradients in computerized tomography

Abstract

We study the effect of the nonlinear dependence of the iterate x k of Conjugate Gradients method (CG) from the data b in the GCV procedure to stop the iterations. We compare two versions of using GCV to stop CG. In one version we compute the GCV function with the iterate x k depending linearly from the data b and the other one depending nonlinearly. We have tested the two versions in a large scale problem: positron emission tomography (PET). Our results suggest the necessity of considering the nonlinearity for the GCV function to obtain a reasonable stopping criterion.

Generalized Cross Validation; Tomography; Conjugate Gradients


The effect of the nonlinearity on GCV applied to conjugate gradients in computerized tomography

Reginaldo J. SantosI; Álvaro R. de PierroII

IDepartamento de Matemática, ICEx, Universidade Federal de Minas Gerais, CP 702, 30123-970 Belo Horizonte, MG, Brazil. Email: regi@mat.ufmg.br

IIDepartment of Applied Mathematics - IMECC, State University of Campinas, CP 6065, 13081-970 Campinas, SP, Brazil. Email: alvaro@ime.unicamp.br

ABSTRACT

We study the effect of the nonlinear dependence of the iterate xk of Conjugate Gradients method (CG) from the data b in the GCV procedure to stop the iterations. We compare two versions of using GCV to stop CG. In one version we compute the GCV function with the iterate xk depending linearly from the data b and the other one depending nonlinearly. We have tested the two versions in a large scale problem: positron emission tomography (PET). Our results suggest the necessity of considering the nonlinearity for the GCV function to obtain a reasonable stopping criterion.

Mathematical subject classification: 62G05, 92C55, 65F10, 65C05.

Keywords: Generalized Cross Validation, Tomography, Conjugate Gradients.

1 Introduction

We consider the problem of estimating a solution x of

where A is a real m × n matrix, b is an m-vector of observations and is an error vector. We assume that the components of are random with zero mean, uncorrelated and with variance s2 (unknown); i.e.;

where E denotes the expectation and I is the identity matrix.

If A is a full rank matrix, the Gauss-Markov Theorem states that the least squares solution of (1.1), i.e., = (AtA)-1Atb, is the best unbiased linear estimator of x, meaning that it is the minimum variance estimator (see, for example [26]). But, if A is ill-conditioned, this minimum variance is still large. It is well known that, if we allow the estimator to be biased the variance could be drastically reduced (see [28, 29]).

One way to do this, is by considering solutions of regularized problems of the form

for a positive real number l, where B is an n × n matrix, that introduces a priori information on the problem. For example, B could be the matrix associated to the discretization of derivatives enforcing smoothness of x.

Another way is to apply an iterative method that is convergent to the solution of

In this case, a regularization effect is obtained by choosing as 'solution' an early iterate of the method (See [31, 9] and the references therein, or see [14, Chapter 7]). Here the iteration number k plays the role of the parameter l of the first approach. For stationary iterative methods in [10] and [23] it is shown that this approach is equivalent in some sense to the first one.

A crucial point in both approaches is the choice of the regularization parameter l in (1.3) or the iteration number k in the second approach. A possibility is to approximate a value of l, such that, the average of the mean square error is minimum, i.e., l solves the problem

where

where xl is the solution of (1.3) and x is one solution of (1.1). Probably, the most popular of these methods is Generalized Cross-Validation (GCV) (See [33, 16]).

If xl is a estimator for x (the solution of (1.3) or one iterate of a iterative method convergent to the solution of (1.4)), the influence operator Al is defined as

For the solution of (1.3), the influence operator is given by

If Al is given by (1.8), in [6, 13] the GCV function V(l) was defined as

GCV chooses the regularization parameter by minimizing V(l). This estimate is a rotationally invariant version of Allen's PRESS, or Cross-Validation [1].

In [6, 13] was proven that, if Al is given by (1.8), then

where , whenever 0 < µ1 < 1. In [8] Eldén presentsa method to compute the trace-term of the GCV function using bidiagonalization. In [12], Golub and Von Matt proposed an iterative method to approximate the trace-term of the GCV function for large scale problems. In [21], Golub, Nguyen and Milanfar extended the derivation of the GCV function to underdetermined systems with applications to superresolution reconstruction.

Another approach is to apply an iterative method directly to the least square problem

minimize ||Ax – b||2

and take as as a 'solution' an early iterate of the method. In [10] and [23], it is proven that this approach is equivalent, in some sense, to (1.3). Now, the role of the parameter l is played by the iteration number k and the GCV functional can be used to choose this number, that is, as a stopping rule, as shown in [32, 25] for stationary linear iterative methods. It is well known that the Conjugate Gradients method applied to the normal equations (CGLS) associated to the problem (1.1) can achieve its best accuracy significantly faster than stationary methods and that CGLS can be used as a regularizing algorithm (see [22]). However, the iterations of CG are a nonlinear function of the data vector b, and, as pointed out in [15], a very good stopping rule should be used. In [15], Hanke and Hansen suggest a Monte Carlo GCV procedure to stop CGLS and the n-method ([15], pages 298-299). But the nonlinearity of the influence operator in the case of CGLS is not taken into account. While for the n-method the influence operator is linear or affine, for CGLS it is nonlinear. However they observe that the application of their procedure "can suffer severely from the nonlinearity of CGLS". Also in [15] another even more crude approximation is used. In the later case the denominator of the GCV functional is approximated by the expression , where p is the matrix rank; and this is the same for every value of b ([15], page 307).

The main goal of this paper is to compare a Monte Carlo GCV procedure deduced in Section 2 that takes into account nonlinearity with another one that linearizes the influence operator for large scale problems like that arising in Positron Emission Tomography (PET).

Another alternative to GCV can be the use of the L-curve (see [15]), that is not considered in this article because of the fact that we wanted to make explicit the importance of considering the nonlinear part of CG and because of the lack in that approach (the L-curve) of most of the statistical information provided by the problem.

In Section 2 we derive an inequality like (1.10), when Al is a nonlinearfunction of b. We also show how to use a Monte Carlo approach, as in [11], in order to compute the denominator of our GCV functional. In Section 3 wedescribe the implementation of our procedure applied to a preconditioned Conjugate Gradients Method (PCCGLS). In Section 4 we present numerical experiments simulating a PET problem and in Section 5 some concluding remarks.

2 GCV for a nonlinear influence operator

We would like to obtain a good estimate for

using the relative residual defined by

and our information on the error (1.2).

The next result is a technical lemma.

Lemma 1. Let F(l), g(l), G(l), r(l) and H(l) be real valued functions with l in and a real number. If

and

then the following expression is valid:

Proof. From (2.3) e (2.4) we have that

that implies (2.5).

Throughout this paper we will assume that the influence operator, defined by

is a continuously differentiable operator as a function of b, with b varying in an open set W, that contains Ax, and if we denote by DAl(·) the Jacobian of Al(·), we have the next result.

Proposition 2. Let {xl} be a family of estimators for the solution of (1.1), and (1.2). For each l, Al(·) is continuously differentiable in W, then

where U(l) is given by (2.2), T(l) by (2.1) and the function Ol(t) is such that ||Ol(t)||< M

t
, for some M > 0.

Proof. Expanding the square of the residual of (1.1) and (1.6), at xl we get that

Now, expanding Al(·) at Ax + around the point Ax in Taylor formula:

where Ol(t) satisfies the hypothesis. Using (1.1) and (2.10) above we obtain

Dividing (2.9) by m, applying the expectation and using (1.2), (2.11) we obtain

Let we introduce the GCV function for nonlinear influence operators. For Ax Î W, we define the GCV functional by

In spite of the fact that we do not know the value of Ax, we will show bellow how we can obtain a good approximation for the denominator of (2.13) for large scale problems.

Theorem 3. Let {xl} be a family of estimators for the solution of (1.1), for which (1.2) is valid. and for V(l) given by (2.13) the following equality holds

where µ1(l) = Tr(DAl(Ax)) and r(l) = E(tOl(t)).

Proof. By Proposition 2 the inequality (2.8) holds. Applying Lemma 1 we get (2.14).

The next Proposition is the basis of our approximation scheme.

Proposition 4. Let w = (w1,...,wm) be a vector of random components with normal distribution, that is, w ~ (0,Im×m). Suppose that Al(·) is continuously differentiable in the open set W that contains Ax. Let and the estimators corresponding to the data vectors b + dw and b – dw respectively. Then

Proof. Expanding Al(·) at Ax + + dw and Ax + + dw around the point Ax up to the first order:

and

Thus, we obtain

and the result follows using a slight extension of the Theorem 2.2 of [11].

For the sake of consistency with the usual mathematical notation for the iterations of an algorithm, in what follows, the parameter l will be substituted by k.

3 Monte Carlo Implementation of GCV for Conjugate Gradients

Our main goal in this section is to apply our Monte Carlo Implementation of GCV as stopping rule for Conjugate Gradients method for solving (1.1). Now, each iterate xk is an estimate of the solution of (1.1) and k plays the role of l. In Section 2 we showed that GCV is applicable to influence operators that are continuously differentiable in an open set that contains Ax.

The Conjugate Gradients method of Hestenes and Stiefel [3] applied to the normal equations (CGLS) can be written as: (for a given x0)

For each k > 0, the operator T(k)(b) = xk is nonlinear and it is continuously differentiable outside the termination set (closed)

where PU denotes the orthogonal projection onto the subspace U (i.e., W = (n – Mk)). If A is an ill-conditioned operator, then T(k)(·) is discontinuous in Mk–1, as pointed out by Eicke, Louis and Plato in [7] in the infinite dimensional context. The same arguments of [7] can be easily used to understand the behavior of the method in the discrete case. For b in Mk–1, we define bl = b + ul, where ul is the singular vector associated to the singular value sl. From the fact that bl Î Mk it follows that

Thus, if sl » 0, then ||T(k)(b) – T(k)(bl)|| » ¥. And if b Î Mk–1 and bw is b perturbed by a random vector w, we should also have ||T(k)(b) – T(k)(bw)|| » ¥. But, usually in ill-posed problems the vector b has all the components of the singular vectors. It means that b does not belong to a termination set Mk, for any k.

Taking into account the results of the preceding section we describe the algorithm for calculating an approximation of the GCV functional (2.13) by using central differences to approximate the direcional derivative.

Algorithm 1.

(i) Generate a pseudo-random vector w = (w1,...,wm)t Î m from a normal distribution with standard deviation equal to one, i.e.,

w ~ (0,Im×m) .

(ii) For each k, compute the iterates xk and corresponding to b, b – dw e b + dw, respectively, where d = 10-4. Take

as an approximation for (

Tr[Im – DAk(Ax)])2 ;

(iii) For each k take

4 Positron emission tomography

The goal of positron emission tomography (PET) is the quantitative determination of the moment-to-moment changes in the chemistry and flow physiology of radioactive labelled components inside the body. The mathematical problem consists of reconstructing a function representing the distribution of radioactivity in a body cross-section from measured data that are the total activity along lines of known location. One of the main differences between this problem and that arising in X-ray tomography [17] is that here measurements tend to be much more noisy, so, direct inversion using convolution backprojection (CBP) doesn't necessarily give the best results (see [30]).

In positron emission tomography (PET) [27], the isotope used emits positrons which annihilate with nearly electrons generating two photons travelling away from each other in (nearly) opposite directions; the number of such photons pairs (detected in time coincidence) for each line or pair of detectors is related to the integral of the concentration of isotope along the line.

Suppose now that we discretize the problem by subdividing the reconstruction region into n small square-shaped picture elements (pixels, for short) and we assume that the activity in each pixel j is a constant, denoted by xj. If we count bi coincidences along m lines and aij denotes the probability that a photon emitted by pixel j is detected by pair i, then yi is a sample from a Poisson distribution whose expected value is aijxj.

It is well known that ART, with small relaxation parameters [18], approximates a weighted least squares solution of (1.1). This fact suggests that the application of Conjugate Gradients to the system (4.21) could give similar or better results [24] and it is a reasonable example to test the capability of GCV as a stopping rule for CG.

With the objetive of comparing the GCV procedures in Computerized Tomography we apply the Conjugate Gradients method ``preconditioned'' with a symmetric ART (from algebraic reconstruction technique) method, as presented recently in [24], i.e. we apply CG to solve the generalized least squares problem (CGGLS)

where Cw = (D + wL), if AAt = L + D + Lt, where L is the strictly lower triangular part and D the diagonal of AAt respectively.

This corresponds to apply CG to the system

In the ART method, w is the relaxation parameter. Here it has a different role, it is the weight of the lower triangular part of A in the preconditioning. When w is zero, this corresponds to a diagonal scaling.

The Conjugate Gradients method applied to (4.21) can be written as:

The elements of the matrix L are not explicitly known, but in [2] was shown an efficient way to compute the products At

r and Aw.

In our numerical experiments we used the programming system SNARK93, developed by the Medical Image Processing Group of the University of Pennsylvania [5]. The images to be reconstructed (phantom) were obtained from a computerized atlas based on typical values inside the brain, as in [18]. The data collection geometry was a divergent one simulating a typical PET data acquisition [5]. We used a discretization with n = 95 × 95 pixels and the divergent geometry had 300 views, of 101 rays each, a total number of m = 30292 equations (8 rays were not considered because their lack of intersection with the image region). The starting point was a uniform image x0 = (a,...,a)t, where a is an approximation of the average density of the phantom given by

The choice of a uniform non-zero starting point was advocated by L. Kaufman [20] and it is widely accepted as the best choice for many researchers in PET. The vector b was taken from a pseudorandom number generator with a Poisson distribution (see [19]). The total photon count was 2 022 085, 991 179, 514 925 and 238 172 (monotonically increasing noise).

We have performed experiments applying Algorithm 1 with photon counts 2 022 085, 991 179, 514 925 and 238 172 and with six values of w, between 0.0 and 0.025. For each value w and each photon count we repeated the experiments ten times. The ten tests gave very similar results.

We plotted, for one of the tests, in Figure 1 the functions

against the iteration number k, for w = 0.0 (diagonal scaling) and w = 0.025. Here we call VHH the Monte Carlo GCV function, if we suppose that PCCGLS is a linear function of the data vector b, as proposed by Hanke-Hansen in [15]. As expected, the minimum of our Monte Carlo GCV function VMC(k) coincides with that of ||A(xk – x)||2, and also the curves are very similar. We can see that using VHH can give wrong results in the case of Conjugate Gradients. Figures 2 and 3 show the reconstructions corresponding to the curves in row 1 of the Fig. 1.




5 Concluding remarks

Iterative methods in Emission Computed Tomography, like RAMLA [4], that are being used nowadays by PET scanners (see for example http://www.medical.philips.com/us/products/pet/products/cpet/), are fast and produce high quality pictures (because they take into account the Poisson nature of the noise) in few iterations. So it was not our goal in this paper to compare Conjugate Gradient with those methods, but to show that our approximation of GCV is applicable as a stopping rule to the Conjugate Gradient method for large scale ill-posed problems, similar to Positron Emission Tomography (very large scale, not severely ill-posed and relatively large noise in the data). Some authors (see, for example, [15, 9]) have suggested the use of other approximations to GCV. Our experimental results show (Fig. 1) that Algorithm 1 gives very good (much better than Hanke et al's) approximations for the minimum of ||A(xk – x)||2.

Further research is needed to extend these applications of GCV to more general nonlinear methods and nonlinear problems. We also need more specific asymptotic theoretical results.

Acknowledgements. We are grateful to J. Browne and G.T. Herman for their support on the use of SNARK93.

Álvaro R. De Pierro was partially supported by CNPq Grant No. 300969 /2003-1 and FAPESP Grant No. 2002 / 07153-2, Brazil.

Received: 11/XI/05. Accepted: 30/XI/05.

#647/05.

  • [1] D.M. Allen, The relationship between variable selection and data augmentation and a method for prediction. Technometrics, 16(1) (1974), 125-127.
  • [2] A. Björck and T. Elfving, Accelerated projection methods for computing pseudoinverse solutions of systems of linear equations. BIT, 19 (1979), 145-163.
  • [3] Ĺke Björck, Numerical Methods for Least Squares Problems SIAM, Philadelphia (1996).
  • [4] J. Browne and A.R. De Pierro, A row-action alternative to the em algorithm for maximizing likelihoods in emission tomography. IEEE Trans. Med. Imag., 15 (1996), 687-699.
  • [5] J.A. Browne, G.T. Herman and D. Odhner, SNARK93 a programming system for image reconstruction from projections. Technical Report MIPG198, Department of Radiology, University of Pennsylvania, 1993.
  • [6] P. Craven and G. Wahba, Smoothing noisy data with spline functions. Numer. Math., 31 (1979), 377-403.
  • [7] B. Eicke, A.K. Louis, and R. Plato, The instability of some gradient methods for ill-posed problems. Numer. Math., 58 (1990), 129-134.
  • [8] L. Eldén, A note on the computation of the generalized cross-validation function for ill-conditioned least squares problems. BIT, 24 (1984), 467-472.
  • [9] H. Engl, M. Hanke and A. Neubauer, Regularization of inverse problems Kluwer Academic Publishers Group, Dordrecht (1996).
  • [10] H.E. Fleming, Equivalence of regularization and truncated iteration in the solution of ill-posed image reconstruction problems. Linear Algebra and its Appl., 130 (1990), 133-150.
  • [11] D.A. Girard, A fast 'Monte-Carlo cross-validation' procedure for large least squares problems with noisy data. Numer. Math., 56 (1989), 1-23.
  • [12] G. Golub and U. von Matt, Generalized cross-validation for large-scale problems. Journal of Computational and Graphical Statistics, 6 (1997), 1-34.
  • [13] G.H. Golub, M.T. Heath and G. Wahba, Generalized cross-validation as a method forchoosing a good ridge parameter. Technometrics, 21 (1979), 215-223.
  • [14] M. Hanke, Conjugate Gradient Type Methods for Ill-Posed Problems Longman Scientific & Technical, Essex,UK (1995).
  • [15] M. Hanke and P.C. Hansen, Regularization methods for large-scale problems. Surv. Math. Ind., 3 (1993), 253-315.
  • [16] P.C. Hansen, Rank-Deficient and Discrete Ill-Posed Problems SIAM, Philadelphia (1998).
  • [17] G.T. Herman, Image Reconstruction from Projections: The Fundamentals of Computerized Tomography Academic Press, New York (1980).
  • [18] G.T. Herman and L.B. Meyer, Algebraic reconstruction techniques can be made computationally efficient. IEEE Trans. Med. Imaging, 12 (1993), 600-609.
  • [19] G.T. Herman and D. Odhner, Performance evaluation of an iterative image reconstruction algorithm for positron emission tomography. IEEE Trans. Med. Imaging, 10 (1991), 336-346.
  • [20] L. Kaufman, Implementing and accelerating the EM algorithm for positron emission tomography. IEEE Trans. Med. Imag., 6 (1987), 37-51.
  • [21] N. Nguyen, P. Milanfar and G. Golub, Efficient generalized cross-validation with applications to parametric image restoration and resolution enhancement. IEEE Transactions on Image Processing, 10(9) (2001), 1299-1308.
  • [22] R. Plato, Optimal algorithms for linear ill-posed problems yield regularization methods. Numer. Funct. Anal. Optim., 11 (1990), 111-118.
  • [23] R.J. Santos, Equivalence of regularization and truncated iteration for general ill-posed problems. Linear Algebra and its Appl., 236 (1996), 25-33.
  • [24] R.J. Santos, Preconditioning conjugate gradient with symmetric algebraic reconstruction technique (ART) in computerized tomography. Applied Numerical Mathematics, 47 (2003), 255-263.
  • [25] R.J. Santos and A.R. De Pierro, A cheaper way to compute generalized cross-validation as a stopping rule for linear stationary iterative methods. Journal of Computational and Graphics Statistics, 12 (2003), 417-433.
  • [26] S.D. Silvey, Statistical Inference Penguin, Harmondsworth (1970).
  • [27] M.M. Ter-Pogossian et al., Positron emission tomography. Scientific American, October: 170-181, 1980.
  • [28] A. van der Sluis and H.A. van der Vorst, Numerical solution of large, sparse linear algebraic systems arising from tomographic problems. In G. Nolet, editor, Seismic Tomography D. Reidel Pub. Comp., Dordrecht, The Netherlands (1987).
  • [29] A. van der Sluis and H.A. van der Vorst, SIRT and CG type methods for the iterative solution of sparse linear least squares problems. Linear Algebra and its Appl., 130 (1990), 257-303.
  • [30] Y. Vardi, L.A. Shepp and L. Kaufman, A statistical model for positron emission tomography. J. Amer. Stat. Assoc., 80(389) (1985), 8-37.
  • [31] C. Vogel, Computational methods for inverse problems SIAM, Philadelphia (2002).
  • [32] G. Wahba, Three topics in ill-posed problems. In H. Engl and C. Groetsch, editors, Inverse and Ill-Posed Problems Academic Press, New York (1987).
  • [33] G. Wahba, Spline Models for Observational Data SIAM, Philadelphia (1991).

Publication Dates

  • Publication in this collection
    27 Sept 2006
  • Date of issue
    2006

History

  • Received
    11 Nov 2005
  • Accepted
    30 Nov 2005
Sociedade Brasileira de Matemática Aplicada e Computacional Sociedade Brasileira de Matemática Aplicada e Computacional - SBMAC, Rua Maestro João Seppe, nº. 900 , 16º. andar - Sala 163, 13561-120 São Carlos - SP Brasil, Tel./Fax: 55 16 3412-9752 - São Carlos - SP - Brazil
E-mail: sbmac@sbmac.org.br