Article snapshot taken from Wikipedia with creative commons attribution-sharealike license.
Give it a read and then ask your questions in the chat.
We can research this topic together.
In mathematics, particularly linear algebra and numerical analysis, the Gram–Schmidt process or Gram-Schmidt algorithm is a way of finding a set of two or more vectors that are perpendicular to each other.
The vector projection of a vector on a nonzero vector is defined as
where denotes the inner product of the vectors and . This means that is the orthogonal projection of onto the line spanned by . If is the zero vector, then is defined as the zero vector.
Given vectors the Gram–Schmidt process defines the vectors as follows:
The sequence is the required system of orthogonal vectors, and the normalized vectors form an orthonormal set. The calculation of the sequence is known as Gram–Schmidt orthogonalization, and the calculation of the sequence is known as Gram–Schmidt orthonormalization.
To check that these formulas yield an orthogonal sequence, first compute by substituting the above formula for : we get zero. Then use this to compute again by substituting the formula for : we get zero. For arbitrary the proof is accomplished by mathematical induction.
Geometrically, this method proceeds as follows: to compute , it projects orthogonally onto the subspace generated by , which is the same as the subspace generated by . The vector is then defined to be the difference between and this projection, guaranteed to be orthogonal to all of the vectors in the subspace .
The Gram–Schmidt process also applies to a linearly independent countably infinite sequence {vi}i. The result is an orthogonal (or orthonormal) sequence {ui}i such that for natural number n: the algebraic span of is the same as that of .
If the Gram–Schmidt process is applied to a linearly dependent sequence, it outputs the 0 vector on the th step, assuming that is a linear combination of . If an orthonormal basis is to be produced, then the algorithm should test for zero vectors in the output and discard them because no multiple of a zero vector can have a length of 1. The number of vectors output by the algorithm will then be the dimension of the space spanned by the original inputs.
A variant of the Gram–Schmidt process using transfinite recursion applied to a (possibly uncountably) infinite sequence of vectors yields a set of orthonormal vectors with such that for any , the completion of the span of is the same as that of . In particular, when applied to a (algebraic) basis of a Hilbert space (or, more generally, a basis of any dense subspace), it yields a (functional-analytic) orthonormal basis. Note that in the general case often the strict inequality holds, even if the starting set was linearly independent, and the span of need not be a subspace of the span of (rather, it's a subspace of its completion).
Example
Euclidean space
Consider the following set of vectors in (with the conventional inner product)
Now, perform Gram–Schmidt, to obtain an orthogonal set of vectors:
We check that the vectors and are indeed orthogonal:
noting that if the dot product of two vectors is 0 then they are orthogonal.
For non-zero vectors, we can then normalize the vectors by dividing out their sizes as shown above:
Properties
Denote by the result of applying the Gram–Schmidt process to a collection of vectors . This yields a map .
Let be orthogonal (with respect to the given inner product). Then we have
Further, a parametrized version of the Gram–Schmidt process yields a (strong) deformation retraction of the general linear group onto the orthogonal group .
Numerical stability
When this process is implemented on a computer, the vectors are often not quite orthogonal, due to rounding errors. For the Gram–Schmidt process as described above (sometimes referred to as "classical Gram–Schmidt") this loss of orthogonality is particularly bad; therefore, it is said that the (classical) Gram–Schmidt process is numerically unstable.
The Gram–Schmidt process can be stabilized by a small modification; this version is sometimes referred to as modified Gram-Schmidt or MGS. This approach gives the same result as the original formula in exact arithmetic and introduces smaller errors in finite-precision arithmetic.
Instead of computing the vector uk as
it is computed as
This method is used in the previous animation, when the intermediate vector is used when orthogonalizing the blue vector .
Here is another description of the modified algorithm. Given the vectors , in our first step we produce vectors by removing components along the direction of . In formulas, . After this step we already have two of our desired orthogonal vectors , namely , but we also made already orthogonal to . Next, we orthogonalize those remaining vectors against . This means we compute by subtraction . Now we have stored the vectors where the first three vectors are already and the remaining vectors are already orthogonal to . As should be clear now, the next step orthogonalizes against . Proceeding in this manner we find the full set of orthogonal vectors . If orthonormal vectors are desired, then we normalize as we go, so that the denominators in the subtraction formulas turn into ones.
Algorithm
The following MATLAB algorithm implements classical Gram–Schmidt orthonormalization. The vectors v1, ..., vk (columns of matrix V, so that V(:,j) is the th vector) are replaced by orthonormal vectors (columns of U) which span the same subspace.
function U = gramschmidt(V)
= size(V);
U = zeros(n,k);
U(:,1) = V(:,1) / norm(V(:,1));
for i = 2:k
U(:,i) = V(:,i);
for j = 1:i-1
U(:,i) = U(:,i) - (U(:,j)'*U(:,i)) * U(:,j);
end
U(:,i) = U(:,i) / norm(U(:,i));
end
end
The cost of this algorithm is asymptotically O(nk) floating point operations, where n is the dimensionality of the vectors.
Via Gaussian elimination
If the rows {v1, ..., vk} are written as a matrix , then applying Gaussian elimination to the augmented matrix will produce the orthogonalized vectors in place of . However the matrix must be brought to row echelon form, using only the row operation of adding a scalar multiple of one row to another. For example, taking as above, we have
Note that the expression for is a "formal" determinant, i.e. the matrix contains both scalars and vectors; the meaning of this expression is defined to be the result of a cofactor expansion along the row of vectors.
The determinant formula for the Gram-Schmidt is computationally (exponentially) slower than the recursive algorithms described above; it is mainly of theoretical interest.
Expressed using geometric algebra
Expressed using notation used in geometric algebra, the unnormalized results of the Gram–Schmidt process can be expressed as
which is equivalent to the expression using the operator defined above. The results can equivalently be expressed as
which is closely related to the expression using determinants above.
Alternatives
Other orthogonalization algorithms use Householder transformations or Givens rotations. The algorithms using Householder transformations are more stable than the stabilized Gram–Schmidt process. On the other hand, the Gram–Schmidt process produces the th orthogonalized vector after the th iteration, while orthogonalization using Householder reflections produces all the vectors only at the end. This makes only the Gram–Schmidt process applicable for iterative methods like the Arnoldi iteration.
In quantum mechanics there are several orthogonalization schemes with characteristics better suited for certain applications than original Gram–Schmidt. Nevertheless, it remains a popular and effective algorithm for even the largest electronic structure calculations.
Pursell, Lyle; Trimble, S. Y. (1 January 1991). "Gram-Schmidt Orthogonalization by Gauss Elimination". The American Mathematical Monthly. 98 (6): 544–549. doi:10.2307/2324877. JSTOR2324877.
Doran, Chris; Lasenby, Anthony (2007). Geometric Algebra for Physicists. Cambridge University Press. p. 124. ISBN978-0-521-71595-9.
Pursell, Yukihiro; et al. (2011). "First-principles calculations of electron states of a silicon nanowire with 100,000 atoms on the K computer". Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. pp. 1:1–1:11. doi:10.1145/2063384.2063386. ISBN9781450307710. S2CID14316074.
In the complex case, this assumes that the inner product is linear in the first argument and conjugate-linear in the second. In physics a more common convention is linearity in the second argument, in which case we define