Product Code Database
Example Keywords: produce -sweater $79
barcode-scavenger
   » » Wiki: Generalized Minimal Residual Method
Tag Wiki 'Generalized Minimal Residual Method'.
Tag

In mathematics, the generalized minimal residual method (GMRES) is an for the numerical solution of an indefinite nonsymmetric system of linear equations. The method approximates the solution by the vector in a with minimal residual. The Arnoldi iteration is used to find this vector.

The GMRES method was developed by and Martin H. Schultz in 1986. It is a generalization and improvement of the method due to Paige and Saunders in 1975.Paige and Saunders, "Solution of Sparse Indefinite Systems of Linear Equations" Https://doi.org/10.1137/0712047< /ref> The MINRES method requires that the matrix is symmetric, but has the advantage that it only requires handling of three vectors. GMRES is a special case of the method developed by Peter Pulay in 1980. DIIS is applicable to non-linear systems.


The method
Denote the of any vector v by \|v\|. Denote the (square) system of linear equations to be solved by Ax = b. The matrix A is assumed to be invertible of size m-by- m. Furthermore, it is assumed that b is normalized, i.e., that \|b\| = 1.

The n-th for this problem is K_n = K_n(A,r_0) = \operatorname{span} \, \{ r_0, Ar_0, A^2r_0, \ldots, A^{n-1}r_0 \}. \, where r_0 = b - A x_0 is the initial residual given an initial guess x_0 \ne 0. Clearly r_0 = b if x_0 = 0.

GMRES approximates the exact solution of Ax = b by the vector x_n \in x_0 + K_n that minimizes the Euclidean norm of the residual r_n= b - Ax_n.

The vectors r_0,Ar_0,\ldots A^{n-1}r_0 might be close to linearly dependent, so instead of this basis, the Arnoldi iteration is used to find orthonormal vectors q_1, q_2, \ldots, q_n \, which form a basis for K_n. In particular, q_1=\|r_0\|_2^{-1}r_0.

Therefore, the vector x_n \in x_0 + K_n can be written as x_n = x_0 + Q_n y_n with y_n \in \mathbb{R}^n , where Q_n is the m-by- n matrix formed by q_1,\ldots,q_n. In other words, finding the n-th approximation of the solution (i.e., x_n) is reduced to finding the vector y_n, which is determined via minimizing the residue as described below.

The Arnoldi process also constructs \tilde{H}_n, an (n+1)-by-n upper Hessenberg matrix which satisfies AQ_n = Q_{n+1} \tilde{H}_n \, an equality which is used to simplify the calculation of y_n (see ). Note that, for symmetric matrices, a symmetric tri-diagonal matrix is actually achieved, resulting in the method.

Because columns of Q_n are orthonormal, we have \begin{align} \left\| r_n \right\| &= \left\| b - A x_n \right\| \\ &= \left\| b - A(x_0 + Q_n y_n) \right\| \\ &= \left\| r_0 - A Q_n y_n \right\| \\ &= \left\| \beta q_1 - A Q_n y_n \right\| \\ &= \left\| \beta q_1 - Q_{n+1} \tilde{H}_n y_n \right\| \\ &= \left\| Q_{n+1} (\beta e_1 - \tilde{H}_n y_n) \right\| \\ &= \left\| \beta e_1 - \tilde{H}_n y_n \right\| \end{align} where e_1 = (1,0,0,\ldots,0)^T \, is the first vector in the of \mathbb{R}^{n+1} , and \beta = \|r_0\| \, , r_0 being the first trial residual vector (usually b). Hence, x_n can be found by minimizing the Euclidean norm of the residual r_n = \tilde{H}_n y_n - \beta e_1. This is a linear least squares problem of size n.

This yields the GMRES method. On the n-th iteration:

  1. calculate q_n with the Arnoldi method;
  2. find the y_n which minimizes \|r_n\|;
  3. compute x_n = x_0 + Q_n y_n ;
  4. repeat if the residual is not yet small enough.
At every iteration, a matrix-vector product A q_n must be computed. This costs about 2m^2 for general dense matrices of size m, but the cost can decrease to O(m) for . In addition to the matrix-vector product, O(nm) floating-point operations must be computed at the n -th iteration.


Convergence
The nth iterate minimizes the residual in the Krylov subspace K_n. Since every subspace is contained in the next subspace, the residual does not increase. After m iterations, where m is the size of the matrix A, the Krylov space K m is the whole of R m and hence the GMRES method arrives at the exact solution. However, the idea is that after a small number of iterations (relative to m), the vector x n is already a good approximation to the exact solution.

This does not happen in general. Indeed, a theorem of Greenbaum, Pták and Strakoš states that for every nonincreasing sequence a1, ..., a m−1, a m = 0, one can find a matrix A such that the = a n for all n, where r n is the residual defined above. In particular, it is possible to find a matrix for which the residual stays constant for m − 1 iterations, and only drops to zero at the last iteration.

In practice, though, GMRES often performs well. This can be proven in specific situations. If the symmetric part of A, that is (A^T + A)/2, is positive definite, then \|r_n\| \leq \left( 1-\frac{\lambda_{\min}^2(1/2(A^T + A))}{ \lambda_{\max}(A^T A)} \right)^{n/2} \|r_0\|, where \lambda_{\mathrm{min}}(M) and \lambda_{\mathrm{max}}(M) denote the smallest and largest of the matrix M, respectively.. NB all results for GCR also hold for GMRES, cf.

If A is and positive definite, then we even have \|r_n\| \leq \left( \frac{\kappa_2(A)^2-1}{\kappa_2(A)^2} \right)^{n/2} \|r_0\|. where \kappa_2(A) denotes the of A in the Euclidean norm.

In the general case, where A is not positive definite, we have \frac{\|r_n\|}{\|b\|} \le \inf_{p \in P_n} \|p(A)\| \le \kappa_2(V) \inf_{p \in P_n} \max_{\lambda \in \sigma(A)} |p(\lambda)|, \, where P n denotes the set of polynomials of degree at most n with p(0) = 1, V is the matrix appearing in the spectral decomposition of A, and σ( A) is the spectrum of A. Roughly speaking, this says that fast convergence occurs when the eigenvalues of A are clustered away from the origin and A is not too far from .

(1997). 9780898713619, Society for Industrial and Applied Mathematics.

All these inequalities bound only the residuals instead of the actual error, that is, the distance between the current iterate x n and the exact solution.


Extensions of the method
Like other iterative methods, GMRES is usually combined with a method in order to speed up convergence.

The cost of the iterations grow as O( n2), where n is the iteration number. Therefore, the method is sometimes restarted after a number, say k, of iterations, with x k as initial guess. The resulting method is called GMRES( k) or Restarted GMRES. For non-positive definite matrices, this method may suffer from stagnation in convergence as the restarted subspace is often close to the earlier subspace.

The shortcomings of GMRES and restarted GMRES are addressed by the recycling of Krylov subspace in the GCRO type methods such as GCROT and GCRODR. Recycling of Krylov subspaces in GMRES can also speed up convergence when sequences of linear systems need to be solved.


Comparison with other solvers
The Arnoldi iteration reduces to the Lanczos iteration for symmetric matrices. The corresponding method is the minimal residual method (MinRes) of Paige and Saunders. Unlike the unsymmetric case, the MinRes method is given by a three-term recurrence relation. It can be shown that there is no Krylov subspace method for general matrices, which is given by a short recurrence relation and yet minimizes the norms of the residuals, as GMRES does.

Another class of methods builds on the unsymmetric Lanczos iteration, in particular the BiCG method. These use a three-term recurrence relation, but they do not attain the minimum residual, and hence the residual does not decrease monotonically for these methods. Convergence is not even guaranteed.

The third class is formed by methods like CGS and BiCGSTAB. These also work with a three-term recurrence relation (hence, without optimality) and they can even terminate prematurely without achieving convergence. The idea behind these methods is to choose the generating polynomials of the iteration sequence suitably.

None of these three classes is the best for all matrices; there are always examples in which one class outperforms the other. Therefore, multiple solvers are tried in practice to see which one is the best for a given problem.


Solving the least squares problem
One part of the GMRES method is to find the vector y_n which minimizes \left\| \tilde{H}_n y_n - \beta e_1 \right\|. Note that \tilde{H}_n is an ( n + 1)-by- n matrix, hence it gives an over-constrained linear system of n+1 equations for n unknowns.

The minimum can be computed using a : find an ( n + 1)-by-( n + 1) orthogonal matrix Ω n and an ( n + 1)-by- n upper triangular matrix \tilde{R}_n such that \Omega_n \tilde{H}_n = \tilde{R}_n. The triangular matrix has one more row than it has columns, so its bottom row consists of zero. Hence, it can be decomposed as \tilde{R}_n = \begin{bmatrix} R_n \\ 0 \end{bmatrix}, where R_n is an n-by- n (thus square) triangular matrix.

The QR decomposition can be updated cheaply from one iteration to the next, because the Hessenberg matrices differ only by a row of zeros and a column: \tilde{H}_{n+1} = \begin{bmatrix} \tilde{H}_n & h_{n+1} \\ 0 & h_{n+2,n+1} \end{bmatrix}, where h n+1 = ( h1, n+1, ..., h n+1, n+1)T. This implies that premultiplying the Hessenberg matrix with Ω n, augmented with zeroes and a row with multiplicative identity, yields almost a triangular matrix: \begin{bmatrix} \Omega_n & 0 \\ 0 & 1 \end{bmatrix} \tilde{H}_{n+1} = \begin{bmatrix} R_n & r_{n+1} \\ 0 & \rho \\ 0 & \sigma \end{bmatrix} This would be triangular if σ is zero. To remedy this, one needs the G_n = \begin{bmatrix} I_{n} & 0 & 0 \\ 0 & c_n & s_n \\ 0 & -s_n & c_n \end{bmatrix} where c_n = \frac{\rho}{\sqrt{\rho^2+\sigma^2}} \quad\text{and}\quad s_n = \frac{\sigma}{\sqrt{\rho^2+\sigma^2}}. With this Givens rotation, we form \Omega_{n+1} = G_n \begin{bmatrix} \Omega_n & 0 \\ 0 & 1 \end{bmatrix}. Indeed, \Omega_{n+1} \tilde{H}_{n+1} = \begin{bmatrix} R_n & r_{n+1} \\ 0 & r_{n+1,n+1} \\ 0 & 0 \end{bmatrix} is a triangular matrix with r_{n+1,n+1} = \sqrt{\rho^2+\sigma^2}.

Given the QR decomposition, the minimization problem is easily solved by noting that \begin{align} \left\| \tilde{H}_n y_n - \beta e_1 \right\| &= \left\| \Omega_n (\tilde{H}_n y_n - \beta e_1) \right\| \\ &= \left\| \tilde{R}_n y_n - \beta \Omega_n e_1 \right\|. \end{align} Denoting the vector \beta\Omega_ne_1 by \tilde{g}_n = \begin{bmatrix} g_n \\ \gamma_n \end{bmatrix} with g nR n and γ nR, this is \begin{align} \left\| \tilde{H}_n y_n - \beta e_1 \right\| &= \left\| \tilde{R}_n y_n - \beta \Omega_n e_1 \right\| \\ &= \left\| \begin{bmatrix} R_n \\ 0 \end{bmatrix} y_n - \begin{bmatrix} g_n \\ \gamma_n \end{bmatrix} \right\|. \end{align} The vector y that minimizes this expression is given by y_n = R_n^{-1} g_n. Again, the vectors g_n are easy to update.

(2025). 9780387954523, Springer.


Example code

Regular GMRES (MATLAB / GNU Octave)
function x, = gmres(A, b, x, max_iterations, threshold)
 n = length(A);
 m = max_iterations;
     

 % use x as the initial vector
 r = b - A * x;
     

 b_norm = norm(b);
 error = norm(r) / b_norm;
     

 % initialize the 1D vectors
 sn = zeros(m, 1);
 cs = zeros(m, 1);
 %e1 = zeros(n, 1);
 e1 = zeros(m+1, 1);
 e1(1) = 1;
 e = [error];
 r_norm = norm(r);
 Q(:,1) = r / r_norm;
 % Note: this is not the beta scalar in section "The method" above but
 % the beta scalar multiplied by e1
 beta = r_norm * e1;
 for k = 1:m
     

   % run arnoldi
   [H(1:k+1, k), Q(:, k+1)] = arnoldi(A, Q, k);
     

   % eliminate the last element in H ith row and update the rotation matrix
   [H(1:k+1, k), cs(k), sn(k)] = apply_givens_rotation(H(1:k+1,k), cs, sn, k);
     

   % update the residual vector
   beta(k + 1) = -sn(k) * beta(k);
   beta(k)     = cs(k) * beta(k);
   error       = abs(beta(k + 1)) / b_norm;
     

   % save the error
   e = [e; error];
     

   if (error <= threshold)
     break;
   end
 end
 % if threshold is not reached, k = m at this point (and not m+1)
     

 % calculate the result
 y = H(1:k, 1:k) \ beta(1:k);
 x = x + Q(:, 1:k) * y;
     
end

%----------------------------------------------------% % Arnoldi Function % %----------------------------------------------------% function h, = arnoldi(A, Q, k)

 q = A*Q(:,k);   % Krylov Vector
 for i = 1:k     % Modified Gram-Schmidt, keeping the Hessenberg matrix
   h(i) = q' * Q(:, i);
   q = q - h(i) * Q(:, i);
 end
 h(k + 1) = norm(q);
 q = q / h(k + 1);
     
end

%---------------------------------------------------------------------% % Applying Givens Rotation to H col % %---------------------------------------------------------------------% function h, = apply_givens_rotation(h, cs, sn, k)

 % apply for ith column
 for i = 1:k-1
   temp   =  cs(i) * h(i) + sn(i) * h(i + 1);
   h(i+1) = -sn(i) * h(i) + cs(i) * h(i + 1);
   h(i)   = temp;
 end
     

 % update the next sin cos values for rotation
 [cs_k, sn_k] = givens_rotation(h(k), h(k + 1));
     

 % eliminate H(i + 1, i)
 h(k) = cs_k * h(k) + sn_k * h(k + 1);
 h(k + 1) = 0.0;
     
end

%%----Calculate the Givens rotation matrix----%% function cs, = givens_rotation(v1, v2) % if (v1 == 0) % cs = 0; % sn = 1; % else

   t = sqrt(v1^2 + v2^2);
     
% cs = abs(v1) / t; % sn = cs * v2 / v1;
   cs = v1 / t;  % see http://www.netlib.org/eispack/comqr.f
   sn = v2 / t;
     
% end end


See also
  • Biconjugate gradient method

Page 1 of 1
1
Page 1 of 1
1

Account

Social:
Pages:  ..   .. 
Items:  .. 

Navigation

General: Atom Feed Atom Feed  .. 
Help:  ..   .. 
Category:  ..   .. 
Media:  ..   .. 
Posts:  ..   ..   .. 

Statistics

Page:  .. 
Summary:  .. 
1 Tags
10/10 Page Rank
5 Page Refs
1s Time