Misplaced Pages

Matrix sign function

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Generalization of signum function to matrices

In mathematics, the matrix sign function is a matrix function on square matrices analogous to the complex sign function.

It was introduced by J.D. Roberts in 1971 as a tool for model reduction and for solving Lyapunov and Algebraic Riccati equation in a technical report of Cambridge University, which was later published in a journal in 1980.

Definition

The matrix sign function is a generalization of the complex signum function

csgn ( z ) = { 1 if  R e ( z ) > 0 , 1 if  R e ( z ) < 0 , {\displaystyle \operatorname {csgn} (z)={\begin{cases}1&{\text{if }}\mathrm {Re} (z)>0,\\-1&{\text{if }}\mathrm {Re} (z)<0,\end{cases}}}

to the matrix valued analogue csgn ( A ) {\displaystyle \operatorname {csgn} (A)} . Although the sign function is not analytic, the matrix function is well defined for all matrices that have no eigenvalue on the imaginary axis, see for example the Jordan-form-based definition (where the derivatives are all zero).

Properties

Theorem: Let A C n × n {\displaystyle A\in \mathbb {C} ^{n\times n}} , then csgn ( A ) 2 = I {\displaystyle \operatorname {csgn} (A)^{2}=I} .

Theorem: Let A C n × n {\displaystyle A\in \mathbb {C} ^{n\times n}} , then csgn ( A ) {\displaystyle \operatorname {csgn} (A)} is diagonalizable and has eigenvalues that are ± 1 {\displaystyle \pm 1} .

Theorem: Let A C n × n {\displaystyle A\in \mathbb {C} ^{n\times n}} , then ( I + csgn ( A ) ) / 2 {\displaystyle (I+\operatorname {csgn} (A))/2} is a projector onto the invariant subspace associated with the eigenvalues in the right-half plane, and analogously for ( I csgn ( A ) ) / 2 {\displaystyle (I-\operatorname {csgn} (A))/2} and the left-half plane.

Theorem: Let A C n × n {\displaystyle A\in \mathbb {C} ^{n\times n}} , and A = P [ J + 0 0 J ] P 1 {\displaystyle A=P{\begin{bmatrix}J_{+}&0\\0&J_{-}\end{bmatrix}}P^{-1}} be a Jordan decomposition such that J + {\displaystyle J_{+}} corresponds to eigenvalues with positive real part and J {\displaystyle J_{-}} to eigenvalue with negative real part. Then csgn ( A ) = P [ I + 0 0 I ] P 1 {\displaystyle \operatorname {csgn} (A)=P{\begin{bmatrix}I_{+}&0\\0&-I_{-}\end{bmatrix}}P^{-1}} , where I + {\displaystyle I_{+}} and I {\displaystyle I_{-}} are identity matrices of sizes corresponding to J + {\displaystyle J_{+}} and J {\displaystyle J_{-}} , respectively.

Computational methods

The function can be computed with generic methods for matrix functions, but there are also specialized methods.

Newton iteration

The Newton iteration can be derived by observing that csgn ( x ) = x 2 / x {\displaystyle \operatorname {csgn} (x)={\sqrt {x^{2}}}/x} , which in terms of matrices can be written as csgn ( A ) = A 1 A 2 {\displaystyle \operatorname {csgn} (A)=A^{-1}{\sqrt {A^{2}}}} , where we use the matrix square root. If we apply the Babylonian method to compute the square root of the matrix A 2 {\displaystyle A^{2}} , that is, the iteration X k + 1 = 1 2 ( X k + A X k 1 ) {\textstyle X_{k+1}={\frac {1}{2}}\left(X_{k}+AX_{k}^{-1}\right)} , and define the new iterate Z k = A 1 X k {\displaystyle Z_{k}=A^{-1}X_{k}} , we arrive at the iteration

Z k + 1 = 1 2 ( Z k + Z k 1 ) {\displaystyle Z_{k+1}={\frac {1}{2}}\left(Z_{k}+Z_{k}^{-1}\right)} ,

where typically Z 0 = A {\displaystyle Z_{0}=A} . Convergence is global, and locally it is quadratic.

The Newton iteration uses the explicit inverse of the iterates Z k {\displaystyle Z_{k}} .

Newton–Schulz iteration

To avoid the need of an explicit inverse used in the Newton iteration, the inverse can be approximated with one step of the Newton iteration for the inverse, Z k 1 Z k ( 2 I Z k 2 ) {\displaystyle Z_{k}^{-1}\approx Z_{k}\left(2I-Z_{k}^{2}\right)} , derived by Schulz(de) in 1933. Substituting this approximation into the previous method, the new method becomes

Z k + 1 = 1 2 Z k ( 3 I Z k 2 ) {\displaystyle Z_{k+1}={\frac {1}{2}}Z_{k}\left(3I-Z_{k}^{2}\right)} .

Convergence is (still) quadratic, but only local (guaranteed for I A 2 < 1 {\displaystyle \|I-A^{2}\|<1} ).

Applications

Solutions of Sylvester equations

Theorem: Let A , B , C R n × n {\displaystyle A,B,C\in \mathbb {R} ^{n\times n}} and assume that A {\displaystyle A} and B {\displaystyle B} are stable, then the unique solution to the Sylvester equation, A X + X B = C {\displaystyle AX+XB=C} , is given by X {\displaystyle X} such that

[ I 2 X 0 I ] = csgn ( [ A C 0 B ] ) . {\displaystyle {\begin{bmatrix}-I&2X\\0&I\end{bmatrix}}=\operatorname {csgn} \left({\begin{bmatrix}A&-C\\0&-B\end{bmatrix}}\right).}

Proof sketch: The result follows from the similarity transform

[ A C 0 B ] = [ I X 0 I ] [ A 0 0 B ] [ I X 0 I ] 1 , {\displaystyle {\begin{bmatrix}A&-C\\0&-B\end{bmatrix}}={\begin{bmatrix}I&X\\0&I\end{bmatrix}}{\begin{bmatrix}A&0\\0&-B\end{bmatrix}}{\begin{bmatrix}I&X\\0&I\end{bmatrix}}^{-1},}

since

csgn ( [ A C 0 B ] ) = [ I X 0 I ] [ I 0 0 I ] [ I X 0 I ] , {\displaystyle \operatorname {csgn} \left({\begin{bmatrix}A&-C\\0&-B\end{bmatrix}}\right)={\begin{bmatrix}I&X\\0&I\end{bmatrix}}{\begin{bmatrix}I&0\\0&-I\end{bmatrix}}{\begin{bmatrix}I&-X\\0&I\end{bmatrix}},}

due to the stability of A {\displaystyle A} and B {\displaystyle B} .

The theorem is, naturally, also applicable to the Lyapunov equation. However, due to the structure the Newton iteration simplifies to only involving inverses of A {\displaystyle A} and A T {\displaystyle A^{T}} .

Solutions of algebraic Riccati equations

There is a similar result applicable to the algebraic Riccati equation, A H P + P A P F P + Q = 0 {\displaystyle A^{H}P+PA-PFP+Q=0} . Define V , W C 2 n × n {\displaystyle V,W\in \mathbb {C} ^{2n\times n}} as

[ V W ] = csgn ( [ A H Q F A ] ) [ I 0 0 I ] . {\displaystyle {\begin{bmatrix}V&W\end{bmatrix}}=\operatorname {csgn} \left({\begin{bmatrix}A^{H}&Q\\F&-A\end{bmatrix}}\right)-{\begin{bmatrix}I&0\\0&I\end{bmatrix}}.}

Under the assumption that F , Q C n × n {\displaystyle F,Q\in \mathbb {C} ^{n\times n}} are Hermitian and there exists a unique stabilizing solution, in the sense that A F P {\displaystyle A-FP} is stable, that solution is given by the over-determined, but consistent, linear system

V P = W . {\displaystyle VP=-W.}

Proof sketch: The similarity transform

[ A H Q F A ] = [ P I I 0 ] [ ( A F P ) F 0 ( A F P ) ] [ P I I 0 ] 1 , {\displaystyle {\begin{bmatrix}A^{H}&Q\\F&-A\end{bmatrix}}={\begin{bmatrix}P&-I\\I&0\end{bmatrix}}{\begin{bmatrix}(-A-FP)&-F\\0&(A-FP)\end{bmatrix}}{\begin{bmatrix}P&-I\\I&0\end{bmatrix}}^{-1},}

and the stability of A F P {\displaystyle A-FP} implies that

( csgn ( [ A H Q F A ] ) [ I 0 0 I ] ) [ X I I 0 ] = [ X I I 0 ] [ 0 Y 0 2 I ] , {\displaystyle \left(\operatorname {csgn} \left({\begin{bmatrix}A^{H}&Q\\F&-A\end{bmatrix}}\right)-{\begin{bmatrix}I&0\\0&I\end{bmatrix}}\right){\begin{bmatrix}X&-I\\I&0\end{bmatrix}}={\begin{bmatrix}X&-I\\I&0\end{bmatrix}}{\begin{bmatrix}0&Y\\0&-2I\end{bmatrix}},}

for some matrix Y C n × n {\displaystyle Y\in \mathbb {C} ^{n\times n}} .

Computations of matrix square-root

The Denman–Beavers iteration for the square root of a matrix can be derived from the Newton iteration for the matrix sign function by noticing that A P I P = 0 {\displaystyle A-PIP=0} is a degenerate algebraic Riccati equation and by definition a solution P {\displaystyle P} is the square root of A {\displaystyle A} .

References

  1. ^ Higham, Nicholas J. (2008). Functions of matrices : theory and computation. Society for Industrial and Applied Mathematics. Philadelphia, Pa.: Society for Industrial and Applied Mathematics (SIAM, 3600 Market Street, Floor 6, Philadelphia, PA 19104). ISBN 978-0-89871-777-8. OCLC 693957820.
  2. ^ Roberts, J. D. (October 1980). "Linear model reduction and solution of the algebraic Riccati equation by use of the sign function". International Journal of Control. 32 (4): 677–687. doi:10.1080/00207178008922881. ISSN 0020-7179.
  3. ^ Denman, Eugene D.; Beavers, Alex N. (1976). "The matrix sign function and computations in systems". Applied Mathematics and Computation. 2 (1): 63–94. doi:10.1016/0096-3003(76)90020-5. ISSN 0096-3003.
  4. Schulz, Günther (1933). "Iterative Berechung der reziproken Matrix". ZAMM - Journal of Applied Mathematics and Mechanics / Zeitschrift für Angewandte Mathematik und Mechanik. 13 (1): 57–59. Bibcode:1933ZaMM...13...57S. doi:10.1002/zamm.19330130111. ISSN 1521-4001.
Categories: