Misplaced Pages

Moore–Penrose inverse: Difference between revisions

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editContent deleted Content addedVisualWikitext
Revision as of 02:42, 8 October 2021 editMonkeyofscience (talk | contribs)3 editsm Removed NumPy comparison with MATLAB. The two do not use the same method of setting the final tolerance.Tag: wikilinks removed← Previous edit Latest revision as of 17:11, 26 December 2024 edit undoChris2crawford (talk | contribs)359 edits Generalizations: Cited an original (or at least early) paper on the weighted pseudoinverse, which is not described in this article. 
(136 intermediate revisions by 50 users not shown)
Line 1: Line 1:
{{Short description|Most widely known generalized inverse of a matrix}}
In ], and in particular ], the '''Moore–Penrose inverse''' {{tmath| A^+ }} of a ] {{tmath| A }} is the most widely known ] of the ].{{sfn|Ben-Israel|Greville|2003|p=7}}{{sfn|Campbell|Meyer, Jr.|1991|p=10}}{{sfn|Nakamura|1991|p=42}}{{sfn|Rao|Mitra|1971|p=50–51}} It was independently described by ]<ref name="Moore1920">{{cite journal | last=Moore | first=E. H. | author-link=E. H. Moore | title=On the reciprocal of the general algebraic matrix | journal=] | volume=26 |issue=9| pages=394–95 | year=1920 | url =http://projecteuclid.org/euclid.bams/1183425340 | doi = 10.1090/S0002-9904-1920-03322-7 | doi-access=free }}</ref> in 1920, ]<ref name="Bjerhammar1951">{{cite journal | last=Bjerhammar| first=Arne| author-link=Arne Bjerhammar | title=Application of calculus of matrices to method of least squares; with special references to geodetic calculations| journal=Trans. Roy. Inst. Tech. Stockholm | year=1951 | volume = 49}}</ref> in 1951, and ]<ref name="Penrose1955">{{cite journal | last=Penrose | first=Roger | author-link=Roger Penrose | title=A generalized inverse for matrices | journal=] | volume=51 | issue=3 | pages=406–13 | year=1955 | doi=10.1017/S0305004100030401| bibcode=1955PCPS...51..406P | doi-access=free }}</ref> in 1955. Earlier, ] had introduced the concept of a pseudoinverse of ]s in 1903. When referring to a matrix, the term ], without further specification, is often used to indicate the Moore–Penrose inverse. The term ] is sometimes used as a synonym for pseudoinverse.
In ], and in particular ], the '''Moore–Penrose inverse''' {{tmath| A^+ }} of a ] {{tmath| A }}, often called the '''pseudoinverse''', is the most widely known generalization of the ].<ref>{{multiref | {{harvnb|Ben-Israel|Greville|2003|p=7}} | {{harvnb| Campbell|Meyer|1991|p=10}} | {{harvnb|Nakamura|1991|p=42}} | {{harvnb|Rao|Mitra|1971|p=50–51}} }}</ref> It was independently described by ] in 1920,<ref name="Moore1920">{{cite journal | last=Moore | first=E. H. | author-link=E. H. Moore | title=On the reciprocal of the general algebraic matrix | journal=] | volume=26 |issue=9| pages=394–95 | year=1920 | url =http://projecteuclid.org/euclid.bams/1183425340 | doi = 10.1090/S0002-9904-1920-03322-7 | doi-access=free }}</ref> ] in 1951,<ref name="Bjerhammar1951">{{cite journal | last=Bjerhammar| first=Arne| author-link=Arne Bjerhammar | title=Application of calculus of matrices to method of least squares; with special references to geodetic calculations| journal=Trans. Roy. Inst. Tech. Stockholm | year=1951 | volume = 49}}</ref> and ] in 1955.<ref name="Penrose1955">{{cite journal | last=Penrose | first=Roger | author-link=Roger Penrose | title=A generalized inverse for matrices | journal=] | volume=51 | issue=3 | pages=406–13 | year=1955 | doi=10.1017/S0305004100030401| bibcode=1955PCPS...51..406P | doi-access=free }}</ref> Earlier, ] had introduced the concept of a pseudoinverse of ]s in 1903. The terms ''pseudoinverse'' and '']'' are sometimes used as synonyms for the Moore–Penrose inverse of a matrix, but sometimes applied to other elements of algebraic structures which share some but not all properties expected for an ].


A common use of the pseudoinverse is to compute a "best fit" (]) solution to a ] that lacks a solution (see below under ]). A common use of the pseudoinverse is to compute a "best fit" (]) approximate solution to a ] that lacks an exact solution (see below under ]).
Another use is to find the minimum (]) norm solution to a system of linear equations with multiple solutions. The pseudoinverse facilitates the statement and proof of results in linear algebra. Another use is to find the minimum (]) norm solution to a system of linear equations with multiple solutions. The pseudoinverse facilitates the statement and proof of results in linear algebra.


The pseudoinverse is defined and unique for all matrices whose entries are ] or ] numbers. It can be computed using the ]. The pseudoinverse is defined for all rectangular matrices whose entries are ] or ] numbers. Given a rectangular matrix with real or complex entries, its pseudoinverse is unique.
It can be computed using the ]. In the special case where {{tmath| A}} is a ] (for example, a Hermitian matrix), the pseudoinverse {{tmath| A^+ }} ] the ] of {{tmath| A}} and acts as a traditional inverse of {{tmath| A}} on the subspace ] to the kernel.


==Notation== ==Notation==
In the following discussion, the following conventions are adopted. In the following discussion, the following conventions are adopted.


* {{tmath| \mathbb{k} }} will denote one of the ] of real or complex numbers, denoted {{tmath| \mathbb{R} }}, {{tmath| \mathbb{C} }}, respectively. The vector space of {{tmath| m \times n }} matrices over {{tmath| \mathbb{k} }} is denoted by {{tmath| \mathbb{k}^{m \times n} }}. * {{tmath| \mathbb{K} }} will denote one of the ] of real or complex numbers, denoted {{tmath| \mathbb{R} }}, {{tmath| \mathbb{C} }}, respectively. The vector space of {{tmath| m \times n }} matrices over {{tmath| \mathbb{K} }} is denoted by {{tmath| \mathbb{K}^{m \times n} }}.
* For {{tmath| A \in \mathbb{k}^{m\times n} }}, {{tmath| A^\textsf{T} }} and {{tmath| A^* }} denote the transpose and Hermitian transpose (also called ]) respectively. If <math>\mathbb{k} = \mathbb{R}</math>, then <math>A^* = A^\textsf{T}</math>. * For {{tmath| A \in \mathbb{K}^{m\times n} }}, the transpose is denoted {{tmath| A^\operatorname{T} }} and the Hermitian transpose (also called ]) is denoted {{tmath| A^* }}. If <math>\mathbb{K} = \mathbb{R}</math>, then <math>A^* = A^\operatorname{T}</math>.
* For {{tmath| A \in \mathbb{k}^{m\times n} }}, {{tmath| \operatorname{ran}(A) }} (standing for "]") denotes the ] (image) of {{tmath| A }} (the space spanned by the column vectors of {{tmath| A }}) and {{tmath| \ker(A) }} denotes the ] (null space) of {{tmath| A }}. * For {{tmath| A \in \mathbb{K}^{m\times n} }}, {{tmath| \operatorname{ran}(A) }} (standing for "]") denotes the ] (]) of {{tmath| A }} (the space spanned by the column vectors of {{tmath| A }}) and {{tmath| \ker(A) }} denotes the ] (null space) of {{tmath| A }}.
* Finally, for any positive integer {{tmath| n }}, {{tmath| I_n \in \mathbb{k}^{n\times n} }} denotes the {{tmath| n \times n }} ]. * For any positive integer {{tmath| n }}, the {{tmath| n \times n }} ] is denoted {{tmath| I_n \in \mathbb{K}^{n\times n} }}.


==Definition== ==Definition==
For {{tmath| A \in \mathbb{k}^{m\times n} }}, a pseudoinverse of {{mvar| A}} is defined as a matrix {{tmath| A^+ \in \mathbb{k}^{n\times m} }} satisfying all of the following four criteria, known as the Moore–Penrose conditions:<ref name="Penrose1955"/><ref name="GvL1996">{{cite book | last=Golub | first=Gene H. | author-link=Gene H. Golub |author2=Charles F. Van Loan | title=Matrix computations | url=https://archive.org/details/matrixcomputatio00golu_910 | url-access=limited | edition=3rd | publisher=Johns Hopkins | location=Baltimore | year=1996 | isbn=978-0-8018-5414-9 | pages = –258| author2-link=Charles F. Van Loan }}</ref> For <math>A \in \mathbb{K}^{m\times n}</math>, a pseudoinverse of {{mvar| A}} is defined as a matrix {{tmath| A^+ \in \mathbb{K}^{n\times m} }} satisfying all of the following four criteria, known as the Moore–Penrose conditions:<ref name="Penrose1955"/><ref name="GvL1996">{{cite book | last=Golub | first=Gene H. | author-link=Gene H. Golub |author2=Charles F. Van Loan | title=Matrix computations | url=https://archive.org/details/matrixcomputatio00golu_910 | url-access=limited | edition=3rd | publisher=Johns Hopkins | location=Baltimore | year=1996 | isbn=978-0-8018-5414-9 | pages = –258| author2-link=Charles F. Van Loan }}</ref>
# {{tmath| A A^+ }} need not be the general identity matrix, but it maps all column vectors of {{mvar| A }} to themselves: <math display="block">A A^+ A = \; A.</math>
{{olist
# {{tmath| A^+ }} acts like a ]: <math display="block">A^+ A A^+ = \; A^+.</math>
|
# {{tmath| A A^+ }} is ]: <math display="block">\left(A A^+\right)^* = \; A A^+.</math>
; {{tmath|A A^+ A}} {{tmath|1= = \; A }}: {{tmath| A A^+ }} need not be the general identity matrix, but it maps all column vectors of {{mvar| A }} to themselves;
# {{tmath| A^+A }} is also Hermitian: <math display="block">\left(A^+ A\right)^* = \; A^+ A.</math>
|
; {{tmath|A^+ A A^+ }} {{tmath|1= = \; A^+ }} : {{tmath| A^+ }} acts like a ];
|
; {{tmath|\left(A A^+\right)^* }} {{tmath|1= = \; A A^+ }} : {{tmath| A A^+ }} is ];
|
; {{tmath|\left(A^+ A\right)^* }} {{tmath|1= = \; A^+ A }} : {{tmath| A^+A }} is also Hermitian.
}}
{{tmath| A^+ }} exists for any matrix {{mvar| A }}, but, when the latter has full ] (that is, the rank of {{mvar| A }} is {{tmath| \min \{ m,n \} }}), then {{tmath| A^+ }} can be expressed as a simple algebraic formula.


Note that <math>A^+A</math> and <math>AA^+</math> are idempotent operators, as follows from <math>(AA^+)^2=A A^+</math> and <math>(A^+ A)^2=A^+ A</math>. More specifically, <math>A^+A</math> projects onto the image of <math>A^T</math> (equivalently, the span of the rows of <math>A</math>), and <math>AA^+</math> projects onto the image of <math>A</math> (equivalently, the span of the columns of <math>A</math>). In fact, the above four conditions are fully equivalent to <math>A^+A</math> and <math>AA^+</math> being such orthogonal projections: <math>AA^+</math> projecting onto the image of <math>A</math> implies <math>(A A^+)A=A</math>, and <math>A^+A</math> projecting onto the image of <math>A^T</math> implies <math>(A^+A)A^T=A^T</math>.
In particular, when {{tmath| A }} has linearly independent columns (and thus matrix {{tmath| A^* A }} is invertible), {{tmath| A^+ }} can be computed as
<math display="block"> A^+ = \left(A^* A\right)^{-1} A^*.</math>


The pseudoinverse <math>A^+</math> exists for any matrix <math>A \in \mathbb{K}^{m\times n}</math>. If furthermore <math>A</math> is full ], that is, its rank is {{tmath| \min \{ m,n \} }}, then {{tmath| A^+ }} can be given a particularly simple algebraic expression. In particular:
This particular pseudoinverse constitutes a ''left inverse'', since, in this case, <math> A^+A = I </math>.


When {{mvar| A }} has linearly independent rows (matrix {{tmath| A A^* }} is invertible), {{tmath| A^+ }} can be computed as * When {{tmath| A }} has linearly independent columns (equivalently, {{tmath| A }} is injective, and thus {{tmath| A^* A }} is invertible), {{tmath| A^+ }} can be computed as<math display="block">A^+ = \left(A^* A\right)^{-1} A^*.</math>This particular pseudoinverse is a ''left inverse'', that is, <math>A^+A = I</math>.
<math display="block"> A^+ = A^* \left(A A^*\right)^{-1}.</math> * If, on the other hand, <math>A</math> has linearly independent rows (equivalently, <math>A</math> is surjective, and thus {{tmath| A A^* }} is invertible), {{tmath| A^+ }} can be computed as<math display="block">A^+ = A^* \left(A A^*\right)^{-1}.</math>This is a ''right inverse'', as <math>A A^+ = I</math>.


In the more general case, the pseudoinverse can be expressed leveraging the ]. Any matrix can be decomposed as <math> A=UDV^*</math> for some isometries <math>U,V</math> and diagonal nonnegative real matrix <math>D</math>. The pseudoinverse can then be written as <math>A^+=V D^{+} U^*</math>, where <math>D^{+}</math> is the pseudoinverse of <math>D</math> and can be obtained by transposing the matrix and replacing the nonzero values with their multiplicative inverses.{{sfn|Campbell|Meyer|1991}} That this matrix satisfies the above requirement is directly verified observing that <math>AA^+=UU^*</math> and <math>A^+ A=VV^*</math>, which are the projections onto image and support of <math>A</math>, respectively.
This is a ''right inverse'', as <math> A A^+ = I</math>.


==Properties== ==Properties==
{{hatnote|Proofs for some of these facts may be found on a separate page, ].}}


===Existence and uniqueness=== ===Existence and uniqueness===
The pseudoinverse exists and is unique: for any matrix {{tmath| A }}, there is precisely one matrix {{tmath| A^+ }}, that satisfies the four properties of the definition.<ref name="GvL1996"/> As discussed above, for any matrix {{tmath| A }} there is one and only one pseudoinverse {{tmath| A^+ }}.<ref name="GvL1996"/>


A matrix satisfying the first condition of the definition is known as a generalized inverse. If the matrix also satisfies the second definition, it is called a ]. Generalized inverses always exist but are not in general unique. Uniqueness is a consequence of the last two conditions. A matrix satisfying only the first of the conditions given above, namely <math display="inline">A A^+ A = A</math>, is known as a generalized inverse. If the matrix also satisfies the second condition, namely <math display="inline">A^+ A A^+ = A^+</math>, it is called a ]. Generalized inverses always exist but are not in general unique. Uniqueness is a consequence of the last two conditions.


===Basic properties=== ===Basic properties===
Proofs for the properties below can be found at ].
* If {{tmath| A }} has real entries, then so does {{tmath| A^+ }}. * If {{tmath| A }} has real entries, then so does {{tmath| A^+ }}.
* If {{tmath| A }} is ], its pseudoinverse is its inverse. That is, <math>A^+ = A^{-1}</math>.<ref name="SB2002">{{Cite book | last1=Stoer | first1=Josef | last2=Bulirsch | first2=Roland | title=Introduction to Numerical Analysis | publisher=] | location=Berlin, New York | edition=3rd | isbn=978-0-387-95452-3 | year=2002}}.</ref>{{rp|243}} * If {{tmath| A }} is ], its pseudoinverse is its inverse. That is, <math>A^+ = A^{-1}</math>.<ref name="SB2002">{{Cite book | last1=Stoer | first1=Josef | last2=Bulirsch | first2=Roland | title=Introduction to Numerical Analysis | publisher=] | location=Berlin, New York | edition=3rd | isbn=978-0-387-95452-3 | year=2002}}.</ref>{{rp|243}}
* The pseudoinverse of a ] is its transpose.
* The pseudoinverse of the pseudoinverse is the original matrix: <math>\left(A^+\right)^+ = A</math>.<ref name="SB2002" />{{rp|245}} * The pseudoinverse of the pseudoinverse is the original matrix: <math>\left(A^+\right)^+ = A</math>.<ref name="SB2002" />{{rp|245}}
* Pseudoinversion commutes with transposition, complex conjugation, and taking the conjugate transpose:<ref name="SB2002"/>{{rp|245}} <!-- reference only mentions the last bit --> * Pseudoinversion commutes with transposition, complex conjugation, and taking the conjugate transpose:<ref name="SB2002" />{{rp|245}} <!-- reference only mentions the last bit --> <math display="block">\left(A^\operatorname{T}\right)^+ = \left(A^+\right)^\operatorname{T}, \quad \left(\overline{A}\right)^+ = \overline{A^+}, \quad \left(A^*\right)^+ = \left(A^+\right)^* .</math>
*: <math>\left(A^\textsf{T}\right)^+ = \left(A^+\right)^\textsf{T}</math>, <math>\left(\overline{A}\right)^+ = \overline{A^+}</math>, <math>\left(A^*\right)^+ = \left(A^+\right)^*</math>. * The pseudoinverse of a scalar multiple of {{tmath| A }} is the reciprocal multiple of {{tmath| A^+ }}:<math display="block">\left(\alpha A\right)^+ = \alpha^{-1} A^+</math> for {{tmath| \alpha \neq 0 }}; otherwise, <math>\left(0 A\right)^+ = 0 A^+ = 0 A^\operatorname{T}</math>, or <math>0^+=0^\operatorname{T}</math>.
* The kernel and image of the pseudoinverse coincide with those of the conjugate transpose: <math>\ker\left(A^+\right) = \ker\left(A^*\right)</math> and <math>\operatorname{ran}\left(A^+\right) = \operatorname{ran}\left(A^*\right)</math>.
* The pseudoinverse of a scalar multiple of {{tmath| A }} is the reciprocal multiple of {{tmath| A^+ }}:
*: <math>\left(\alpha A\right)^+ = \alpha^{-1} A^+</math> for {{tmath| \alpha \neq 0 }}.


====Identities==== ====Identities====
The following identities can be used to cancel certain subexpressions or expand expressions involving pseudoinverses. Proofs for these properties can be found in the ]. The following identity formula can be used to cancel or expand certain subexpressions involving pseudoinverses:
<math display="block">\begin{alignat}{3} <math display="block">
A^+ ={}& A^+ && A^{+*} && A^* \\ A = {}A{}A^*{}A^{+*}{} = {}A^{+*}{}A^*{}A.
</math>
={}& A^* && A^{+*} && A^+, \\
Equivalently, substituting <math>A^+</math> for <math>A</math> gives
A ={}& A^{+*} && A^* && A \\
<math display="block">
={}& A && A^* && A^{+*}, \\
A^* ={}& A^* && A && A^+ \\ A^+ ={}A^+{}A^{+*}{}A^*{} = {}A^*{}A^{+*}{}A^+,
</math>
={}& A^+ && A && A^*.
while substituting <math>A^*</math> for <math>A</math> gives
\end{alignat}</math>
<math display="block">
A^* ={}A^*{}A{}A^+{}={}A^+{}A{}A^*.
</math>


===Reduction to Hermitian case=== ===Reduction to Hermitian case===
Line 74: Line 69:
as {{tmath| A^*A }} and {{tmath| A A^* }} are Hermitian. as {{tmath| A^*A }} and {{tmath| A A^* }} are Hermitian.


===Pseudoinverse of products===
===Products===

Suppose {{tmath| A \in \mathbb{k}^{m\times n},\ B \in \mathbb{k}^{n\times p} }}. Then the following are equivalent:<ref>{{Cite journal|last=Greville|first=T. N. E.|date=1966-10-01|title=Note on the Generalized Inverse of a Matrix Product|url=https://epubs.siam.org/doi/10.1137/1008107|journal=SIAM Review|volume=8|issue=4|pages=518–521|doi=10.1137/1008107|issn=0036-1445}}</ref>
The equality {{tmath|1= (AB)^+ = B^+ A^+ }} does not hold in general. Rather, suppose {{tmath| A \in \mathbb{K}^{m\times n},\ B \in \mathbb{K}^{n\times p} }}. Then the following are equivalent:<ref>{{Cite journal|last=Greville|first=T. N. E.|date=1966-10-01|title=Note on the Generalized Inverse of a Matrix Product|url=https://epubs.siam.org/doi/10.1137/1008107|journal=SIAM Review|volume=8|issue=4|pages=518–521|doi=10.1137/1008107|bibcode=1966SIAMR...8..518G |issn=0036-1445}}</ref>


# {{tmath|1= (AB)^+ = B^+ A^+ }} # <math display="inline">(AB)^+ = B^+ A^+</math>
# <math>A^+ A BB^* A^* = BB^* A^* </math> and <math>BB^+ A^* A B = A^* A B</math>
# <math display="inline">\begin{aligned}
# <math display="inline">\left(A^+ A BB^*\right)^* = A^+ A BB^*</math> and <math>\left(A^* A BB^+\right)^* = A^* A BB^+</math>
A^+ A BB^* A^* & = BB^* A^*, \\
BB^+ A^* A B & = A^* A B. # <math display="inline">A^+ A BB^* A^* A BB^+ = BB^* A^* A</math>
# <math display="inline">A^+ A B = B (AB)^+ AB </math> and <math>BB^+ A^* = A^* A B (AB)^+</math>.
\end{aligned}</math>
# <math>\begin{aligned}
\left(A^+ A BB^*\right)^* & = A^+ A BB^*, \\
\left(A^* A BB^+\right)^* & = A^* A BB^+.
\end{aligned}</math>
# <math>A^+ A BB^* A^* A BB^+ = BB^* A^* A</math>
# <math>\begin{aligned}
A^+ A B & = B (AB)^+ AB, \\
BB^+ A^* & = A^* A B (AB)^+.
\end{aligned}</math>


The following are sufficient conditions for {{tmath|1= (AB)^+ = B^+ A^+ }}: The following are sufficient conditions for {{tmath|1= (AB)^+ = B^+ A^+ }}:
# {{tmath| A }} has orthonormal columns (then <math>A^*A = A^+ A = I_n</math>), &nbsp; or # {{tmath| A }} has orthonormal columns (then <math>A^*A = A^+ A = I_n</math>), or
# {{tmath| B }} has orthonormal rows (then <math>BB^* = BB^+ = I_n</math>), &nbsp; or # {{tmath| B }} has orthonormal rows (then <math>BB^* = BB^+ = I_n</math>), or
# {{tmath| A }} has linearly independent columns (then <math>A^+ A = I</math> ) and {{tmath| B }} has linearly independent rows (then <math>BB^+ = I</math>), &nbsp; or # {{tmath| A }} has linearly independent columns (then <math>A^+ A = I</math> ) and {{tmath| B }} has linearly independent rows (then <math>BB^+ = I</math>), &nbsp; or
# <math>B = A^*</math>, or # <math>B = A^*</math>, or
Line 102: Line 89:
# <math>(A^+ A) (BB^+) = (BB^+) (A^+ A)</math> # <math>(A^+ A) (BB^+) = (BB^+) (A^+ A)</math>


The last sufficient condition yields the equalities The fourth sufficient condition yields the equalities
<math display="block">\begin{align} <math display="block">\begin{align}
\left(A A^*\right)^+ &= A^{+*} A^+, \\ \left(A A^*\right)^+ &= A^{+*} A^+, \\
\left(A^* A\right)^+ &= A^+ A^{+*}. \left(A^* A\right)^+ &= A^+ A^{+*}.
\end{align}</math> \end{align}</math>


NB: The equality {{tmath|1= (AB)^+ = B^+ A^+ }} does not hold in general. Here is a counterexample where {{tmath|1= (AB)^+ \neq B^+ A^+ }}:
See the counterexample:


<math display="block">\Biggl( \begin{pmatrix} 1 & 1 \\ 0 & 0 \end{pmatrix} \begin{pmatrix} 0 & 0 \\ 1 & 1 \end{pmatrix} \Biggr)^+ = \begin{pmatrix} 1 & 1 \\ 0 & 0 \end{pmatrix}^+ = \begin{pmatrix} <math display="block">\Biggl( \begin{pmatrix} 1 & 1 \\ 0 & 0 \end{pmatrix} \begin{pmatrix} 0 & 0 \\ 1 & 1 \end{pmatrix} \Biggr)^+ = \begin{pmatrix} 1 & 1 \\ 0 & 0 \end{pmatrix}^+ = \begin{pmatrix}
\tfrac12 & 0 \\ \tfrac12 & 0 \end{pmatrix} \quad \neq \quad \begin{pmatrix} \tfrac12 & 0 \\ \tfrac12 & 0 \end{pmatrix} \quad \neq \quad \begin{pmatrix}
\tfrac14 & 0 \\ \tfrac14 & 0 \end{pmatrix} = \begin{pmatrix} 0 & \tfrac12 \\ 0 & \tfrac12 \end{pmatrix} \begin{pmatrix} \tfrac12 & 0 \\ \tfrac12 & 0 \end{pmatrix} = \begin{pmatrix} 0 & 0 \\ 1 & 1 \end{pmatrix}^+ \begin{pmatrix} 1 & 1 \\ 0 & 0 \end{pmatrix}^+ </math> \tfrac14 & 0 \\ \tfrac14 & 0 \end{pmatrix} = \begin{pmatrix} 0 & \tfrac12 \\ 0 & \tfrac12 \end{pmatrix} \begin{pmatrix} \tfrac12 & 0 \\ \tfrac12 & 0 \end{pmatrix} = \begin{pmatrix} 0 & 0 \\ 1 & 1 \end{pmatrix}^+ \begin{pmatrix} 1 & 1 \\ 0 & 0 \end{pmatrix}^+</math>


===Projectors=== ===Projectors===
Line 120: Line 106:
* {{tmath| P }} is the ] onto the ] of {{tmath| A }} (which equals the ] of the kernel of {{tmath| A^* }}). * {{tmath| P }} is the ] onto the ] of {{tmath| A }} (which equals the ] of the kernel of {{tmath| A^* }}).
* {{tmath| Q }} is the orthogonal projector onto the range of {{tmath| A^* }} (which equals the orthogonal complement of the kernel of {{tmath| A }}). * {{tmath| Q }} is the orthogonal projector onto the range of {{tmath| A^* }} (which equals the orthogonal complement of the kernel of {{tmath| A }}).
* <math>(I - Q) = \left(I - A^+A\right)</math> is the orthogonal projector onto the kernel of {{tmath| A }}. * <math>I - Q = I - A^+A</math> is the orthogonal projector onto the kernel of {{tmath| A }}.
* <math>(I - P) = \left(I - A A^+\right)</math> is the orthogonal projector onto the kernel of {{tmath| A^* }}.<ref name="GvL1996"/> * <math>I - P = I - A A^+</math> is the orthogonal projector onto the kernel of {{tmath| A^* }}.<ref name="GvL1996"/>


The last two properties imply the following identities: The last two properties imply the following identities:
Line 127: Line 113:
* <math>A^*\left(I - A A^+\right) = \left(I - A^+A\right)A^* = 0</math> * <math>A^*\left(I - A A^+\right) = \left(I - A^+A\right)A^* = 0</math>


Another property is the following: if {{tmath| A \in \mathbb{k}^{n\times n} }} is Hermitian and idempotent (true if and only if it represents an orthogonal projection), then, for any matrix {{tmath| B\in \mathbb{k}^{m\times n} }} the following equation holds:<ref>{{cite journal|first1=Anthony A.|last1=Maciejewski|first2=Charles A.|last2=Klein|title=Obstacle Avoidance for Kinematically Redundant Manipulators in Dynamically Varying Environments|journal=International Journal of Robotics Research|volume=4|issue=3|pages=109–117|year=1985|doi=10.1177/027836498500400308|hdl=10217/536|s2cid=17660144|hdl-access=free}}</ref> Another property is the following: if {{tmath| A \in \mathbb{K}^{n\times n} }} is Hermitian and idempotent (true if and only if it represents an orthogonal projection), then, for any matrix {{tmath| B\in \mathbb{K}^{m\times n} }} the following equation holds:<ref>{{cite journal|first1=Anthony A.|last1=Maciejewski|first2=Charles A.|last2=Klein|title=Obstacle Avoidance for Kinematically Redundant Manipulators in Dynamically Varying Environments|journal=International Journal of Robotics Research|volume=4|issue=3|pages=109–117|year=1985|doi=10.1177/027836498500400308|hdl=10217/536|s2cid=17660144|hdl-access=free}}</ref>
<math display="block"> A(BA)^+ = (BA)^+</math> <math display="block">A(BA)^+ = (BA)^+</math>


This can be proven by defining matrices <math>C = BA</math>, <math>D = A(BA)^+</math>, and checking that {{tmath| D }} is indeed a pseudoinverse for {{tmath| C }} by verifying that the defining properties of the pseudoinverse hold, when {{tmath| A }} is Hermitian and idempotent. This can be proven by defining matrices <math>C = BA</math>, <math>D = A(BA)^+</math>, and checking that {{tmath| D }} is indeed a pseudoinverse for {{tmath| C }} by verifying that the defining properties of the pseudoinverse hold, when {{tmath| A }} is Hermitian and idempotent.


From the last property it follows that, if {{tmath| A \in \mathbb{k}^{n\times n} }} is Hermitian and idempotent, for any matrix {{tmath| B \in \mathbb{k}^{n\times m} }} From the last property it follows that, if {{tmath| A \in \mathbb{K}^{n\times n} }} is Hermitian and idempotent, for any matrix {{tmath| B \in \mathbb{K}^{n\times m} }}
<math display="block">(AB)^+A = (AB)^+</math> <math display="block">(AB)^+A = (AB)^+</math>


Line 138: Line 124:


===Geometric construction=== ===Geometric construction===
If we view the matrix as a linear map {{tmath| A:\mathbb{k}^n \to \mathbb{k}^m }} over the field {{tmath| \mathbb{k} }} then {{tmath| A^+: \mathbb{k}^m \to \mathbb{k}^n }} can be decomposed as follows. We write {{tmath| \oplus }} for the ], {{tmath| \perp }} for the ], {{tmath| \ker }} for the ] of a map, and {{tmath| \operatorname{ran} }} for the ] of a map. Notice that <math>\mathbb{k}^n = \left(\ker A\right)^\perp \oplus \ker A</math> and <math>\mathbb{k}^m = \operatorname{ran} A \oplus \left(\operatorname{ran} A\right)^\perp</math>. The restriction <math> A: \left(\ker A\right)^\perp \to \operatorname{ran} A</math> is then an isomorphism. This implies that {{tmath| A^+ }} on {{tmath| \operatorname{ran} A }} is the inverse of this isomorphism, and is zero on <math>\left(\operatorname{ran} A\right)^\perp . </math> If we view the matrix as a linear map {{tmath| A:\mathbb{K}^n \to \mathbb{K}^m }} over the field {{tmath| \mathbb{K} }} then {{tmath| A^+: \mathbb{K}^m \to \mathbb{K}^n }} can be decomposed as follows. We write {{tmath| \oplus }} for the ], {{tmath| \perp }} for the ], {{tmath| \ker }} for the ] of a map, and {{tmath| \operatorname{ran} }} for the image of a map. Notice that <math>\mathbb{K}^n = \left(\ker A\right)^\perp \oplus \ker A</math> and <math>\mathbb{K}^m = \operatorname{ran} A \oplus \left(\operatorname{ran} A\right)^\perp</math>. The restriction <math> A: \left(\ker A\right)^\perp \to \operatorname{ran} A</math> is then an isomorphism. This implies that {{tmath| A^+ }} on {{tmath| \operatorname{ran} A }} is the inverse of this isomorphism, and is zero on <math>\left(\operatorname{ran} A\right)^\perp .</math>


In other words: To find {{tmath| A^+b }} for given {{tmath| b }} in {{tmath| \mathbb{k}^m }}, first project {{tmath| b }} orthogonally onto the range of {{tmath| A }}, finding a point {{tmath| p(b) }} in the range. Then form {{tmath| A^{-1}(\{p(b)\}) }}, that is, find those vectors in {{tmath| \mathbb{k}^n }} that {{tmath| A }} sends to {{tmath| p(b) }}. This will be an affine subspace of {{tmath| \mathbb{k}^n }} parallel to the kernel of {{tmath| A }}. The element of this subspace that has the smallest length (that is, is closest to the origin) is the answer {{tmath| A^+b }} we are looking for. It can be found by taking an arbitrary member of {{tmath| A^{-1}(\{p(b)\}) }} and projecting it orthogonally onto the orthogonal complement of the kernel of {{tmath| A }}. In other words: To find {{tmath| A^+b }} for given {{tmath| b }} in {{tmath| \mathbb{K}^m }}, first project {{tmath| b }} orthogonally onto the range of {{tmath| A }}, finding a point {{tmath| p(b) }} in the range. Then form {{tmath| A^{-1}(\{p(b)\}) }}, that is, find those vectors in {{tmath| \mathbb{K}^n }} that {{tmath| A }} sends to {{tmath| p(b) }}. This will be an affine subspace of {{tmath| \mathbb{K}^n }} parallel to the kernel of {{tmath| A }}. The element of this subspace that has the smallest length (that is, is closest to the origin) is the answer {{tmath| A^+b }} we are looking for. It can be found by taking an arbitrary member of {{tmath| A^{-1}(\{p(b)\}) }} and projecting it orthogonally onto the orthogonal complement of the kernel of {{tmath| A }}.


This description is closely related to the ]. This description is closely related to the ].

===Subspaces===
<math display="block">\begin{align}
\ker\left(A^+\right) &= \ker\left(A^*\right) \\
\operatorname{ran}\left(A^+\right) &= \operatorname{ran}\left(A^*\right)
\end{align}</math>


===Limit relations=== ===Limit relations===
The pseudoinverse are limits: The pseudoinverse are limits:
<math display="block">A^+ = \lim_{\delta \searrow 0} \left(A^* A + \delta I\right)^{-1} A^* <math display="block">A^+ = \lim_{\delta \searrow 0} \left(A^* A + \delta I\right)^{-1} A^*
= \lim_{\delta \searrow 0} A^* \left(A A^* + \delta I\right)^{-1} = \lim_{\delta \searrow 0} A^* \left(A A^* + \delta I\right)^{-1}
</math> </math>
(see ]). These limits exist even if {{tmath| \left(A A^*\right)^{-1} }} or {{tmath| \left(A^*A\right)^{-1} }} do not exist.<ref name="GvL1996"/>{{rp|263}} (see ]). These limits exist even if {{tmath| \left(A A^*\right)^{-1} }} or {{tmath| \left(A^*A\right)^{-1} }} do not exist.<ref name="GvL1996"/>{{rp|263}}
Line 161: Line 141:


===Derivative=== ===Derivative===
The derivative of a real valued pseudoinverse matrix which has constant rank at a point {{tmath| x }} may be calculated in terms of the derivative of the original matrix:<ref>{{cite journal|title=The Differentiation of Pseudo-Inverses and Nonlinear Least Squares Problems Whose Variables Separate|first1=G. H.|last1=Golub|first2=V.|last2=Pereyra|journal=SIAM Journal on Numerical Analysis|volume=10|number=2|date=April 1973|pages=413–32|jstor=2156365|doi=10.1137/0710036|bibcode=1973SJNA...10..413G}}</ref> Let <math>x \mapsto A(x)</math> be a real-valued differentiable matrix function with constant rank at a point {{tmath| x_0 }}.
The derivative of <math>x \mapsto A^+(x)</math> at <math>x_0</math> may be calculated in terms of the derivative of <math>A</math> at <math>x_0</math>:<ref>{{cite journal|title=The Differentiation of Pseudo-Inverses and Nonlinear Least Squares Problems Whose Variables Separate|first1=G. H.|last1=Golub|first2=V.|last2=Pereyra|journal=SIAM Journal on Numerical Analysis|volume=10|number=2|date=April 1973|pages=413–32|jstor=2156365|doi=10.1137/0710036|bibcode=1973SJNA...10..413G}}</ref>
<math display="block"> <math display="block">
\frac{\mathrm d}{\mathrm d x} A^+(x) = \left.\frac{\mathrm d}{\mathrm d x}\right|_{x = x_0\!\!\!\!\!\!\!} A^+
-A^+ \left( \frac{\mathrm d}{\mathrm d x} A \right) A^+ ~+~ = -A^+ \left( \frac{\mathrm{d} A}{\mathrm d x} \right) A^+
A^+ A^{+\textsf{T}} \left(\frac{\mathrm d}{\mathrm d x} A^\textsf{T}\right) \left(I - A A^+\right) ~+~ ~+~ A^+ A^{+\top} \left(\frac{\mathrm{d} A^\top}{\mathrm{d} x} \right) \left(I - A A^+\right)
\left(I - A^+ A\right) \left(\frac{\text{d}}{\text{d} x} A^\textsf{T}\right) A^{+\textsf{T}} A^+ ~+~ \left(I - A^+ A\right) \left(\frac{\mathrm{d} A^\top}{\mathrm{d} x} \right) A^{+\top} A^+,
</math> </math>
where the functions <math>A</math>, <math>A^+</math> and derivatives on the right side are evaluated at <math>x_0</math> (that is, <math>A := A(x_0)</math>, <math>A^+ := A^+(x_0)</math>, etc.).
For a complex matrix, the transpose is replaced with the conjugate transpose.<ref>{{cite book |last1=Hjørungnes |first1=Are |title=Complex-valued matrix derivatives: with applications in signal processing and communications |date=2011 |publisher=Cambridge university press |location=New York |isbn=9780521192644 |page=52}}</ref> For a real-valued symmetric matrix, the ] is established.<ref>{{Cite journal| last1=Liu|first1=Shuangzhe|
last2= Trenkler|first2=Götz| last3=Kollo|first3=Tõnu|
last4=von Rosen|first4=Dietrich|
last5=Baksalary|first5=Oskar Maria|
date= 2023|
title= Professor Heinz Neudecker and matrix differential calculus|
journal= Statistical Papers|volume=65 |issue=4 |pages=2605–2639 | language=en|
doi= 10.1007/s00362-023-01499-w}}</ref>


==Examples== ==Examples==
Since for invertible matrices the pseudoinverse equals the usual inverse, only examples of non-invertible matrices are considered below. Since for invertible matrices the pseudoinverse equals the usual inverse, only examples of non-invertible matrices are considered below.


* For <math>A = \begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix},</math> the pseudoinverse is <math>A^+ = \begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix}.</math> The uniqueness of this pseudoinverse can be seen from the requirement <math>A^+ = A^+ A A^+</math>, since multiplication by a zero matrix would always produce a zero matrix.
{{unordered list
| For <math>A = \begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix},</math> the pseudoinverse is <math>A^+ = \begin{pmatrix} 0 & 0 \\ 0 & 0 \end{pmatrix}.</math> (Generally, the pseudoinverse of a zero matrix is its transpose.) The uniqueness of this pseudoinverse can be seen from the requirement <math>A^+ = A^+ A A^+</math>, since multiplication by a zero matrix would always produce a zero matrix. * For <math>A = \begin{pmatrix} 1 & 0 \\ 1 & 0 \end{pmatrix},</math> the pseudoinverse is <math>A^+ = \begin{pmatrix} \frac{1}{2} & \frac{1}{2} \\ 0 & 0 \end{pmatrix}</math>.
| For <math>A = \begin{pmatrix} 1 & 0 \\ 1 & 0 \end{pmatrix},</math> the pseudoinverse is <math>A^+ = \begin{pmatrix} \frac{1}{2} & \frac{1}{2} \\ 0 & 0 \end{pmatrix}.</math> : Indeed, <math>A\,A^+ = \begin{pmatrix} \frac{1}{2} & \frac{1}{2} \\ \frac{1}{2} & \frac{1}{2} \end{pmatrix}</math>, and thus <math>A\,A^+ A = \begin{pmatrix} 1 & 0 \\ 1 & 0\end{pmatrix} = A</math>. Similarly, <math>A^+A = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix}</math>, and thus <math>A^+A\,A^+ = \begin{pmatrix} \frac{1}{2} & \frac{1}{2} \\ 0 & 0 \end{pmatrix} = A^+</math>.
: Note that {{tmath| A }} is neither injective nor surjective, and thus the pseudoinverse cannot be computed via <math>A^+ = \left(A^* A\right)^{-1} A^*</math> nor <math>A^+ = A^* \left( A A^*\right)^{-1}</math>, as <math>A^* A</math> and <math>A A^*</math> are both singular, and furthermore <math>A^+</math> is neither a left nor a right inverse.
: Nonetheless, the pseudoinverse can be computed via SVD observing that <math>A=\sqrt2 \left(\frac{\mathbf e_1+\mathbf e_2}{\sqrt2}\right) \mathbf e_1^*</math>, and thus <math>A^+=\frac{1}{\sqrt2} \,\mathbf e_1 \left(\frac{\mathbf e_1+\mathbf e_2}{\sqrt2}\right)^*</math>.
* For <math>A = \begin{pmatrix} 1 & 0 \\ -1 & 0 \end{pmatrix},</math> <math>A^+ = \begin{pmatrix} \frac{1}{2} & -\frac{1}{2} \\ 0 & 0 \end{pmatrix}.</math>
* For <math>A = \begin{pmatrix} 1 & 0 \\ 2 & 0 \end{pmatrix},</math> <math>A^+ = \begin{pmatrix} \frac{1}{5} & \frac{2}{5} \\ 0 & 0 \end{pmatrix}</math>. The denominators are here <math>5 = 1^2 + 2^2</math>.
* For <math>A = \begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix},</math> <math>A^+ = \begin{pmatrix} \frac{1}{4} & \frac{1}{4} \\ \frac{1}{4} & \frac{1}{4} \end{pmatrix}.</math>
* For <math>A = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 1 \end{pmatrix},</math> the pseudoinverse is <math>A^+ = \begin{pmatrix} 1 & 0 & 0 \\ 0 & \frac{1}{2} & \frac{1}{2} \end{pmatrix}</math>.
: For this matrix, the ] exists and thus equals <math>A^+</math>, indeed, <math>A^+A = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}.</math>


Indeed, <math>A\,A^+ = \begin{pmatrix} \frac{1}{2} & \frac{1}{2} \\ \frac{1}{2} & \frac{1}{2} \end{pmatrix},</math> and thus <math>A\,A^+ A = \begin{pmatrix} 1 & 0 \\ 1 & 0\end{pmatrix} = A.</math>

Similarly, <math>A^+A = \begin{pmatrix} 1 & 0 \\ 0 & 0 \end{pmatrix},</math> and thus <math>A^+A\,A^+ = \begin{pmatrix} \frac{1}{2} & \frac{1}{2} \\ 0 & 0 \end{pmatrix} = A^+.</math>
| For <math>A = \begin{pmatrix} 1 & 0 \\ -1 & 0 \end{pmatrix}, </math> <math>A^+ = \begin{pmatrix} \frac{1}{2} & -\frac{1}{2} \\ 0 & 0 \end{pmatrix}.</math>
| For <math>A = \begin{pmatrix} 1 & 0 \\ 2 & 0 \end{pmatrix}, </math> <math>A^+ = \begin{pmatrix} \frac{1}{5} & \frac{2}{5} \\ 0 & 0 \end{pmatrix}.</math> (The denominators are <math>5 = 1^2 + 2^2</math>.)
| For <math>A = \begin{pmatrix} 1 & 1 \\ 1 & 1 \end{pmatrix}, </math> <math>A^+ = \begin{pmatrix} \frac{1}{4} & \frac{1}{4} \\ \frac{1}{4} & \frac{1}{4} \end{pmatrix}.</math>
| For <math>A = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 0 & 1 \end{pmatrix},</math> the pseudoinverse is <math>A^+ = \begin{pmatrix} 1 & 0 & 0 \\ 0 & \frac{1}{2} & \frac{1}{2} \end{pmatrix}.</math>

For this matrix, the ] exists and thus equals <math>A^+</math>, indeed, <math>A^+A = \begin{pmatrix} 1 & 0 \\ 0 & 1 \end{pmatrix}.</math>
}}


==Special cases== ==Special cases==
Line 191: Line 178:
It is also possible to define a pseudoinverse for scalars and vectors. This amounts to treating these as matrices. The pseudoinverse of a scalar {{tmath| x }} is zero if {{tmath| x }} is zero and the reciprocal of {{tmath| x }} otherwise: It is also possible to define a pseudoinverse for scalars and vectors. This amounts to treating these as matrices. The pseudoinverse of a scalar {{tmath| x }} is zero if {{tmath| x }} is zero and the reciprocal of {{tmath| x }} otherwise:
<math display="block">x^+ = \begin{cases} <math display="block">x^+ = \begin{cases}
0, & \mbox{if }x = 0; \\ 0, & \mbox{if }x = 0; \\
x^{-1}, & \mbox{otherwise}. x^{-1}, & \mbox{otherwise}.
\end{cases}</math> \end{cases}</math>


===Vectors=== ===Vectors===
The pseudoinverse of the null (all zero) vector is the transposed null vector. The pseudoinverse of a non-null vector is the conjugate transposed vector divided by its squared magnitude: The pseudoinverse of the null (all zero) vector is the transposed null vector. The pseudoinverse of a non-null vector is the conjugate transposed vector divided by its squared magnitude:

<math display="block">\vec{x}^+ = \begin{cases} <math display="block">\vec{x}^+ = \begin{cases}
\vec{0}^\textsf{T}, & \mbox{if } \vec{x} = \vec{0}; \\ \vec{0}^\operatorname{T}, & \text{if } \vec{x} = \vec{0}; \\
\dfrac{\vec{x}^*}{\vec{x}^* \vec{x}}, & \mbox{otherwise}. \vec{x}^* / (\vec{x}^* \vec{x}), & \text{otherwise}.
\end{cases}</math> \end{cases}</math>

=== Diagonal matrices ===
The pseudoinverse of a squared diagonal matrix is obtained by taking the reciprocal of the nonzero diagonal elements. Formally, if <math>D</math> is a squared diagonal matrix with <math>D=\tilde D\oplus \mathbf 0_{k\times k}</math> and <math>\tilde D>0</math>, then <math>D^+=\tilde D^{-1}\oplus \mathbf 0_{k\times k}</math>. More generally, if <math>A</math> is any <math>m\times n</math> rectangular matrix whose only nonzero elements are on the diagonal, meaning <math>A_{ij}=\delta_{ij} a_i</math>, <math>a_i\in\mathbb K</math>, then <math>A^+</math> is a <math>n\times m</math> rectangular matrix whose diagonal elements are the reciprocal of the original ones, that is, <math>A_{ii}\neq 0\implies A^+_{ii}=1/A_{ii}</math>.


===Linearly independent columns=== ===Linearly independent columns===
If the '''columns''' of {{tmath| A }} are ] If the rank of {{tmath| A }} is identical to the number of columns, {{tmath| n }}, (for {{tmath| n \le m}},) there are {{tmath| n }} ] columns, and {{tmath| A^*A }} is invertible. In this case, an explicit formula is:{{sfn|Ben-Israel|Greville|2003}}
<math display="block">A^+ = \left(A^*A\right)^{-1}A^*.</math>
(so that {{tmath| m \ge n }}), then {{tmath| A^*A }} is invertible. In this case, an explicit formula is:{{sfn|Ben-Israel|Greville|2003}}
<math display="block">A^+ = \left(A^*A\right)^{-1}A^*</math>.


It follows that {{tmath| A^+ }} is then a left inverse of {{tmath| A }}: &nbsp; <math>A^+ A = I_n</math>. It follows that {{tmath| A^+ }} is then a left inverse of {{tmath| A }}: &nbsp; <math>A^+ A = I_n</math>.


===Linearly independent rows=== ===Linearly independent rows===
If the '''rows''' of {{tmath| A }} are linearly independent (so that {{tmath| m \le n }}), then {{tmath| A A^* }} is invertible. In this case, an explicit formula is: If the rank of {{tmath| A }} is identical to the number of rows, {{tmath| m }}, (for {{tmath| m \le n}},) there are {{tmath| m }} ] rows, and {{tmath| AA^* }} is invertible. In this case, an explicit formula is:
<math display="block">A^+ = A^*\left(A A^*\right)^{-1}</math>. <math display="block">A^+ = A^*\left(A A^*\right)^{-1}.</math>


It follows that {{tmath| A^+ }} is a right inverse of {{tmath| A }}: &nbsp; <math>A A^+ = I_m</math>. It follows that {{tmath| A^+ }} is a right inverse of {{tmath| A }}: &nbsp; <math>A A^+ = I_m</math>.
Line 220: Line 210:


=== Normal matrices === === Normal matrices ===
If {{tmath| A }} is a ]; that is, it commutes with its conjugate transpose; then its pseudoinverse can be computed by diagonalizing it, mapping all nonzero eigenvalues to their inverses, and mapping zero eigenvalues to zero. A corollary is that {{tmath| A }} commuting with its transpose implies that it commutes with its pseudoinverse. If {{tmath| A }} is ], that is, it commutes with its conjugate transpose, then its pseudoinverse can be computed by diagonalizing it, mapping all nonzero eigenvalues to their inverses, and mapping zero eigenvalues to zero. A corollary is that {{tmath| A }} commuting with its transpose implies that it commutes with its pseudoinverse.

=== EP matrices ===
A (square) matrix {{tmath| A }} is said to be an EP matrix if it commutes with its pseudoinverse. In such cases (and only in such cases), it is possible to obtain the pseudoinverse as a polynomial in {{tmath| A }}. A polynomial <math>p(t)</math> such that <math>A^+=p(A)</math> can be easily obtained from the characteristic polynomial of {{tmath| A }} or, more generally, from any annihilating polynomial of {{tmath| A }}.<ref name="Bajo">{{cite journal | author=Bajo, I. | title=Computing Moore–Penrose Inverses with Polynomials in Matrices | journal=] | volume=128 | issue=5 | pages=446–456 | year=2021 | doi=10.1080/00029890.2021.1886840| hdl=11093/6146 | hdl-access=free }}</ref>


===Orthogonal projection matrices=== ===Orthogonal projection matrices===
This is a special case of a Normal matrix with eigenvalues 0 and 1. If {{tmath| A }} is an orthogonal projection matrix, that is, <math>A = A^*</math> and <math>A^2 = A</math>, then the pseudoinverse trivially coincides with the matrix itself: This is a special case of a normal matrix with eigenvalues 0 and 1. If {{tmath| A }} is an orthogonal projection matrix, that is, <math>A = A^*</math> and <math>A^2 = A</math>, then the pseudoinverse trivially coincides with the matrix itself:
<math display="block">A^+ = A.</math> <math display="block">A^+ = A.</math>


===Circulant matrices=== ===Circulant matrices===
For a ] {{tmath| C }}, the singular value decomposition is given by the ], that is, the singular values are the Fourier coefficients. Let {{tmath| \mathcal{F} }} be the ], then<ref name="Stallings1972">{{cite journal | last1=Stallings | first1=W. T. | author-link=W. T. Stallings | title=The Pseudoinverse of an ''r''-Circulant Matrix | journal=] | volume=34 | issue=2 | pages=385–88 | year=1972 | doi=10.2307/2038377 | last2=Boullion | first2=T. L.| jstor=2038377 }}</ref> For a ] {{tmath| C }}, the singular value decomposition is given by the ], that is, the singular values are the Fourier coefficients. Let {{tmath| \mathcal{F} }} be the ]; then<ref name="Stallings1972">{{cite journal | last1=Stallings | first1=W. T. | author-link=W. T. Stallings | title=The Pseudoinverse of an ''r''-Circulant Matrix | journal=] | volume=34 | issue=2 | pages=385–88 | year=1972 | doi=10.2307/2038377 | last2=Boullion | first2=T. L.| jstor=2038377 }}</ref>
<math display="block">\begin{align} <math display="block">\begin{align}
C &= \mathcal{F}\cdot\Sigma\cdot\mathcal{F}^* \\ C &= \mathcal{F}\cdot\Sigma\cdot\mathcal{F}^*,\\
C^+ &= \mathcal{F}\cdot\Sigma^+\cdot\mathcal{F}^* C^+ &= \mathcal{F}\cdot\Sigma^+\cdot\mathcal{F}^*.
\end{align}</math> \end{align}</math>


==Construction== ==Construction==
===Rank decomposition=== ===Rank decomposition===
Let {{tmath| r \le \min(m, n) }} denote the ] of {{tmath| A \in \mathbb{k}^{m\times n} }}. Then {{tmath| A }} can be ] as Let {{tmath| r \le \min(m, n) }} denote the ] of {{tmath| A \in \mathbb{K}^{m\times n} }}. Then {{tmath| A }} can be ] as
<math>A = BC</math> where {{tmath| B \in \mathbb{k}^{m\times r} }} and {{tmath| C \in \mathbb{k}^{r\times n} }} are of rank {{tmath| r }}. Then <math>A^+ = C^+B^+ = C^*\left(CC^*\right)^{-1}\left(B^*B\right)^{-1}B^*</math>. <math>A = BC</math> where {{tmath| B \in \mathbb{K}^{m\times r} }} and {{tmath| C \in \mathbb{K}^{r\times n} }} are of rank {{tmath| r }}. Then <math>A^+ = C^+B^+ = C^*\left(CC^*\right)^{-1}\left(B^*B\right)^{-1}B^*</math>.


===The QR method=== ===The QR method===
For <math>\mathbb{k} \in \{ \mathbb{R}, \mathbb{C}\}</math> computing the product {{tmath| A A^* }} or {{tmath| A^*A }} and their inverses explicitly is often a source of numerical rounding errors and computational cost in practice. An alternative approach using the ] of {{tmath| A }} may be used instead. For <math>\mathbb{K} \in \{ \mathbb{R}, \mathbb{C}\}</math> computing the product {{tmath| A A^* }} or {{tmath| A^*A }} and their inverses explicitly is often a source of numerical rounding errors and computational cost in practice. An alternative approach using the ] of {{tmath| A }} may be used instead.


Consider the case when {{tmath| A }} is of full column rank, so that <math>A^+ = \left(A^*A\right)^{-1}A^*</math>. Then the ] <math>A^*A = R^*R</math>, where {{tmath| R }} is an ], may be used. Multiplication by the inverse is then done easily by solving a system with multiple right-hand sides, Consider the case when {{tmath| A }} is of full column rank, so that <math>A^+ = \left(A^*A\right)^{-1}A^*</math>. Then the ] <math>A^*A = R^*R</math>, where {{tmath| R }} is an ], may be used. Multiplication by the inverse is then done easily by solving a system with multiple right-hand sides,
<math display="block">A^+ = \left(A^*A\right)^{-1}A^* \quad \Leftrightarrow \quad \left(A^*A\right)A^+ = A^* \quad \Leftrightarrow \quad R^*RA^+ = A^* </math> <math display="block">A^+ = \left(A^*A\right)^{-1}A^* \quad \Leftrightarrow \quad \left(A^*A\right)A^+ = A^* \quad \Leftrightarrow \quad R^*RA^+ = A^*</math>


which may be solved by ] followed by ]. which may be solved by ] followed by ].


The Cholesky decomposition may be computed without forming {{tmath| A^*A }} explicitly, by alternatively using the ] of <math> A = Q R</math>, where <math>Q</math> has orthonormal columns, <math> Q^*Q = I </math>, and {{tmath| R }} is upper triangular. Then The Cholesky decomposition may be computed without forming {{tmath| A^*A }} explicitly, by alternatively using the ] of <math>A = Q R</math>, where <math>Q</math> has orthonormal columns, <math>Q^*Q = I</math>, and {{tmath| R }} is upper triangular. Then
<math display="block"> A^*A \,=\, (Q R)^*(Q R) \,=\, R^*Q^*Q R \,=\, R^*R ,</math> <math display="block">A^*A\, =\, (Q R)^*(Q R) \,=\, R^*Q^*Q R \,=\, R^*R ,</math>


so {{tmath| R }} is the Cholesky factor of {{tmath| A^*A }}. so {{tmath| R }} is the Cholesky factor of {{tmath| A^*A }}.


The case of full row rank is treated similarly by using the formula <math>A^+ = A^*\left(A A^*\right)^{-1}</math> and using a similar argument, swapping the roles of {{tmath| A }} and {{tmath| A^* }}. The case of full row rank is treated similarly by using the formula <math>A^+ = A^*\left(A A^*\right)^{-1}</math> and using a similar argument, swapping the roles of {{tmath| A }} and {{tmath| A^* }}.

===Using polynomials in matrices===
For an arbitrary {{tmath| A \in \mathbb{K}^{m\times n} }}, one has that <math>A^*A</math> is normal and, as a consequence, an EP matrix. One can then find a polynomial <math>p(t)</math> such that <math>(A^*A)^+=p(A^*A)</math>. In this case one has that the pseudoinverse of {{tmath| A}} is given by<ref name="Bajo" />
<math display="block">A^+= p(A^*A)A^*= A^*p(AA^*).</math>


===Singular value decomposition (SVD)=== ===Singular value decomposition (SVD)===
A computationally simple and accurate way to compute the pseudoinverse is by using the ].{{sfn|Ben-Israel|Greville|2003}}<ref name="GvL1996"/><ref name="SLEandPI"></ref> If <math>A = U\Sigma V^*</math> is the singular value decomposition of {{tmath| A }}, then <math>A^+ = V\Sigma^+ U^*</math>. For a ] such as {{tmath| \Sigma }}, we get the pseudoinverse by taking the reciprocal of each non-zero element on the diagonal, leaving the zeros in place, and then transposing the matrix. In numerical computation, only elements larger than some small tolerance are taken to be nonzero, and the others are replaced by zeros. For example, in the ] or ] function {{mono|pinv}}, the tolerance is taken to be {{math|1=''t'' = ε⋅max(''m'', ''n'')⋅max(Σ)}}, where ε is the ]. A computationally simple and accurate way to compute the pseudoinverse is by using the ].{{sfn|Ben-Israel|Greville|2003}}<ref name="GvL1996"/><ref name="SLEandPI"></ref> If <math>A = U\Sigma V^*</math> is the singular value decomposition of {{tmath| A }}, then <math>A^+ = V\Sigma^+ U^*</math>. For a ] such as {{tmath| \Sigma }}, we get the pseudoinverse by taking the reciprocal of each non-zero element on the diagonal, leaving the zeros in place. In numerical computation, only elements larger than some small tolerance are taken to be nonzero, and the others are replaced by zeros. For example, in the ] or ] function {{mono|pinv}}, the tolerance is taken to be {{math|1=''t'' = ε⋅max(''m'', ''n'')⋅max(Σ)}}, where ε is the ].


The computational cost of this method is dominated by the cost of computing the SVD, which is several times higher than matrix–matrix multiplication, even if a state-of-the art implementation (such as that of ]) is used. The computational cost of this method is dominated by the cost of computing the SVD, which is several times higher than matrix–matrix multiplication, even if a state-of-the art implementation (such as that of ]) is used.
Line 261: Line 258:


===Block matrices=== ===Block matrices===
] exist for calculating the pseudoinverse of block structured matrices. ] exist for calculating the pseudoinverse of block-structured matrices.


===The iterative method of Ben-Israel and Cohen=== ===The iterative method of Ben-Israel and Cohen===
Another method for computing the pseudoinverse (cf. ]) uses the recursion Another method for computing the pseudoinverse (cf. ]) uses the recursion
<math display="block"> A_{i+1} = 2A_i - A_i A A_i, </math> <math display="block">A_{i+1} = 2A_i - A_i A A_i,</math>


which is sometimes referred to as hyper-power sequence. This recursion produces a sequence converging quadratically to the pseudoinverse of {{tmath| A }} if it is started with an appropriate {{tmath| A_0 }} satisfying <math>A_0 A = \left(A_0 A\right)^*</math>. The choice <math>A_0 = \alpha A^*</math> (where <math>0 < \alpha < 2/\sigma^2_1(A)</math>, with {{tmath| \sigma_1(A) }} denoting the largest singular value of {{tmath| A }}) <ref>{{cite journal | last1=Ben-Israel | first1=Adi | last2=Cohen | first2=Dan | title=On Iterative Computation of Generalized Inverses and Associated Projections | journal=SIAM Journal on Numerical Analysis | volume=3 | issue=3 | pages=410–19 | year=1966 | jstor=2949637 | doi=10.1137/0703035 | bibcode=1966SJNA....3..410B }}</ref> has been argued not to be competitive to the method using the SVD mentioned above, because even for moderately ill-conditioned matrices it takes a long time before {{tmath| A_i }} enters the region of quadratic convergence.<ref>{{cite journal | last1=Söderström | first1=Torsten | last2=Stewart | first2=G. W. | title=On the Numerical Properties of an Iterative Method for Computing the Moore–Penrose Generalized Inverse | journal=SIAM Journal on Numerical Analysis | volume=11 | issue=1 | pages=61–74 | year=1974 | jstor=2156431 | doi=10.1137/0711008 | bibcode=1974SJNA...11...61S }}</ref> However, if started with {{tmath| A_0 }} already close to the Moore–Penrose inverse and <math>A_0 A = \left(A_0 A\right)^*</math>, for example <math>A_0 := \left(A^* A + \delta I\right)^{-1} A^*</math>, convergence is fast (quadratic). which is sometimes referred to as hyper-power sequence. This recursion produces a sequence converging quadratically to the pseudoinverse of {{tmath| A }} if it is started with an appropriate {{tmath| A_0 }} satisfying <math>A_0 A = \left(A_0 A\right)^*</math>. The choice <math>A_0 = \alpha A^*</math> (where <math>0 < \alpha < 2/\sigma^2_1(A)</math>, with {{tmath| \sigma_1(A) }} denoting the largest singular value of {{tmath| A }})<ref>{{cite journal | last1=Ben-Israel | first1=Adi | last2=Cohen | first2=Dan | title=On Iterative Computation of Generalized Inverses and Associated Projections | journal=SIAM Journal on Numerical Analysis | volume=3 | issue=3 | pages=410–19 | year=1966 | jstor=2949637 | doi=10.1137/0703035 | bibcode=1966SJNA....3..410B }}</ref> has been argued not to be competitive to the method using the SVD mentioned above, because even for moderately ill-conditioned matrices it takes a long time before {{tmath| A_i }} enters the region of quadratic convergence.<ref>{{cite journal | last1=Söderström | first1=Torsten | last2=Stewart | first2=G. W. | title=On the Numerical Properties of an Iterative Method for Computing the Moore–Penrose Generalized Inverse | journal=SIAM Journal on Numerical Analysis | volume=11 | issue=1 | pages=61–74 | year=1974 | jstor=2156431 | doi=10.1137/0711008 | bibcode=1974SJNA...11...61S }}</ref> However, if started with {{tmath| A_0 }} already close to the Moore–Penrose inverse and <math>A_0 A = \left(A_0 A\right)^*</math>, for example <math>A_0 := \left(A^* A + \delta I\right)^{-1} A^*</math>, convergence is fast (quadratic).


===Updating the pseudoinverse=== ===Updating the pseudoinverse===
For the cases where {{tmath| A }} has full row or column rank, and the inverse of the correlation matrix ({{tmath| A A^* }} for {{tmath| A }} with full row rank or {{tmath| A^*A }} for full column rank) is already known, the pseudoinverse for matrices related to {{tmath| A }} can be computed by applying the ] to update the inverse of the correlation matrix, which may need less work. In particular, if the related matrix differs from the original one by only a changed, added or deleted row or column, incremental algorithms exist that exploit the relationship.<ref name="G1992">{{Cite thesis |first= Tino |last=Gramß |title= Worterkennung mit einem künstlichen neuronalen Netzwerk |type=PhD dissertation |publisher= Georg-August-Universität zu Göttingen |year = 1992 | oclc = 841706164 }}</ref><ref name="EMTIYAZ2008">{{cite document |first=Mohammad |last=Emtiyaz |title=Updating Inverse of a Matrix When a Column is Added/Removed |date=February 27, 2008 For the cases where {{tmath| A }} has full row or column rank, and the inverse of the correlation matrix ({{tmath| A A^* }} for {{tmath| A }} with full row rank or {{tmath| A^*A }} for full column rank) is already known, the pseudoinverse for matrices related to {{tmath| A }} can be computed by applying the ] to update the inverse of the correlation matrix, which may need less work. In particular, if the related matrix differs from the original one by only a changed, added or deleted row or column, incremental algorithms exist that exploit the relationship.<ref name="G1992">{{Cite thesis |first= Tino |last=Gramß |title= Worterkennung mit einem künstlichen neuronalen Netzwerk |type=PhD dissertation |publisher= Georg-August-Universität zu Göttingen |year = 1992 | oclc = 841706164 }}</ref><ref name="EMTIYAZ2008">{{cite web |first=Mohammad |last=Emtiyaz |title=Updating Inverse of a Matrix When a Column is Added/Removed |date=February 27, 2008 |url=https://emtiyaz.github.io/Writings/OneColInv.pdf }}</ref>
|url=https://emtiyaz.github.io/Writings/OneColInv.pdf }}</ref>


Similarly, it is possible to update the Cholesky factor when a row or column is added, without creating the inverse of the correlation matrix explicitly. However, updating the pseudoinverse in the general rank-deficient case is much more complicated.<ref>{{cite journal|last=Meyer, Jr.|first=Carl D.|title=Generalized inverses and ranks of block matrices|journal=SIAM J. Appl. Math.|volume=25|issue=4|date=1973|pages=597–602|doi=10.1137/0125057}}</ref><ref>{{cite journal|last=Meyer, Jr.|first=Carl D.|title=Generalized inversion of modified matrices|journal=SIAM J. Appl. Math.|volume=24|issue=3|date=1973|pages=315–23|doi=10.1137/0124033}}</ref> Similarly, it is possible to update the Cholesky factor when a row or column is added, without creating the inverse of the correlation matrix explicitly. However, updating the pseudoinverse in the general rank-deficient case is much more complicated.<ref>{{cite journal|last=Meyer|first=Carl D. Jr.|title=Generalized inverses and ranks of block matrices|journal=SIAM J. Appl. Math.|volume=25|issue=4|date=1973|pages=597–602|doi=10.1137/0125057}}</ref><ref>{{cite journal|last=Meyer|first=Carl D. Jr.|title=Generalized inversion of modified matrices|journal=SIAM J. Appl. Math.|volume=24|issue=3|date=1973|pages=315–23|doi=10.1137/0124033}}</ref>


===Software libraries=== ===Software libraries===
High-quality implementations of SVD, QR, and back substitution are available in ], such as ]. Writing one's own implementation of SVD is a major programming project that requires a significant ]. In special circumstances, such as ] or ], however, alternative implementations by QR or even the use of an explicit inverse might be preferable, and custom implementations may be unavoidable. High-quality implementations of SVD, QR, and back substitution are available in standard libraries, such as ]. Writing one's own implementation of SVD is a major programming project that requires a significant ]. In special circumstances, such as ] or ], however, alternative implementations by QR or even the use of an explicit inverse might be preferable, and custom implementations may be unavoidable.


The Python package ] provides a pseudoinverse calculation through its functions <code>matrix.I</code> and <code>linalg.pinv</code>; its <code>pinv</code> uses the SVD-based algorithm. ] adds a function <code>scipy.linalg.pinv</code> that uses a least-squares solver. The Python package ] provides a pseudoinverse calculation through its functions <code>matrix.I</code> and <code>linalg.pinv</code>; its <code>pinv</code> uses the SVD-based algorithm. ] adds a function <code>scipy.linalg.pinv</code> that uses a least-squares solver.
Line 284: Line 280:
The ] provides a pseudoinverse through the standard package function <code>pinv</code> and the <code>pseudo_inverse()</code> method. The ] provides a pseudoinverse through the standard package function <code>pinv</code> and the <code>pseudo_inverse()</code> method.


In ], the LinearAlgebra package of the standard library provides an implementation of the Moore-Penrose inverse <code>pinv()</code> implemented via singular-value decomposition.<ref>{{cite web |url=https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.pinv |title=LinearAlgebra.pinv}}</ref> In ], the LinearAlgebra package of the standard library provides an implementation of the Moore–Penrose inverse <code>pinv()</code> implemented via singular-value decomposition.<ref>{{cite web |url=https://docs.julialang.org/en/v1/stdlib/LinearAlgebra/#LinearAlgebra.pinv |title=LinearAlgebra.pinv}}</ref>


==Applications== ==Applications==
Line 290: Line 286:
{{See also|Linear least squares (mathematics)}} {{See also|Linear least squares (mathematics)}}


The pseudoinverse provides a ] solution to a ].<ref name="Penrose1956">{{cite journal | last=Penrose | first=Roger | title=On best approximate solution of linear matrix equations | journal=] | volume=52 | pages=17–19 | year=1956 | issue=1 | doi=10.1017/S0305004100030929| bibcode=1956PCPS...52...17P }}</ref> The pseudoinverse provides a ] solution to a ].<ref name="Penrose1956">{{cite journal | last=Penrose | first=Roger | title=On best approximate solution of linear matrix equations | journal=] | volume=52 | pages=17–19 | year=1956 | issue=1 | doi=10.1017/S0305004100030929| bibcode=1956PCPS...52...17P | s2cid=122260851 }}</ref>
For {{tmath| A \in \mathbb{k}^{m\times n} }}, given a system of linear equations For {{tmath| A \in \mathbb{K}^{m\times n} }}, given a system of linear equations
<math display="block">A x = b,</math> <math display="block">A x = b,</math>


in general, a vector {{tmath| x }} that solves the system may not exist, or if one does exist, it may not be unique. The pseudoinverse solves the "least-squares" problem as follows: in general, a vector {{tmath| x }} that solves the system may not exist, or if one does exist, it may not be unique. More specifically, a solution exists if and only if <math>b</math> is in the image of <math>A</math>, and is unique if and only if <math>A</math> is injective. The pseudoinverse solves the "least-squares" problem as follows:


* {{tmath| \forall x \in \mathbb{k}^n }}, we have <math>\left\|Ax - b\right\|_2 \ge \left\|Az - b\right\|_2</math> where <math>z = A^+b</math> and <math>\|\cdot\|_2</math> denotes the ]. This weak inequality holds with equality if and only if <math>x = A^+b + \left(I - A^+A\right)w</math> for any vector {{tmath| w }}; this provides an infinitude of minimizing solutions unless {{tmath| A }} has full column rank, in which case {{tmath| \left(I - A^+A\right) }} is a zero matrix.<ref name=Planitz>{{cite journal|last=Planitz|first=M.|title=Inconsistent systems of linear equations|journal=Mathematical Gazette|volume=63|issue=425|date=October 1979|pages=181–85|doi=10.2307/3617890|jstor=3617890}}</ref> The solution with minimum Euclidean norm is {{tmath| z. }}<ref name=Planitz/> * {{tmath| \forall x \in \mathbb{K}^n }}, we have <math>\left\|Ax - b\right\|_2 \ge \left\|Az - b\right\|_2</math> where <math>z = A^+b</math> and <math>\|\cdot\|_2</math> denotes the ]. This weak inequality holds with equality if and only if <math>x = A^+b + \left(I - A^+A\right)w</math> for any vector {{tmath| w }}; this provides an infinitude of minimizing solutions unless {{tmath| A }} has full column rank, in which case {{tmath| \left(I - A^+A\right) }} is a zero matrix.<ref name=Planitz>{{cite journal|last=Planitz|first=M.|title=Inconsistent systems of linear equations|journal=Mathematical Gazette|volume=63|issue=425|date=October 1979|pages=181–85|doi=10.2307/3617890|jstor=3617890|s2cid=125601192 }}</ref> The solution with minimum Euclidean norm is {{tmath| z. }}<ref name=Planitz/>


This result is easily extended to systems with multiple right-hand sides, when the Euclidean norm is replaced by the Frobenius norm. Let {{tmath| B \in \mathbb{k}^{m\times p} }}. This result is easily extended to systems with multiple right-hand sides, when the Euclidean norm is replaced by the Frobenius norm. Let {{tmath| B \in \mathbb{K}^{m\times p} }}.


* {{tmath| \forall X \in \mathbb{k}^{n\times p} }}, we have <math> \|AX - B\|_{\mathrm{F}} \ge \|AZ -B\|_{\mathrm{F}}</math> where <math>Z = A^+B</math> and <math>\|\cdot\|_{\mathrm{F}} </math> denotes the ]. * {{tmath| \forall X \in \mathbb{K}^{n\times p} }}, we have <math>\|AX - B\|_{\mathrm{F}} \ge \|AZ -B\|_{\mathrm{F}}</math> where <math>Z = A^+B</math> and <math>\|\cdot\|_{\mathrm{F}}</math> denotes the ].


===Obtaining all solutions of a linear system=== ===Obtaining all solutions of a linear system===
Line 307: Line 303:
<math display="block">A x = b</math> <math display="block">A x = b</math>


has any solutions, they are all given by<ref name=James>{{cite journal|last=James|first=M.|title=The generalised inverse|journal=Mathematical Gazette|volume=62|issue=420|date=June 1978|pages=109–14|doi=10.1017/S0025557200086460}}</ref> has any solutions, they are all given by<ref name=James>{{cite journal|last=James|first=M.|title=The generalised inverse|journal=Mathematical Gazette|volume=62|issue=420|date=June 1978|pages=109–14|doi=10.1017/S0025557200086460|s2cid=126385532 }}</ref>


<math display="block">x = A^+ b + \leftw</math> <math display="block">x = A^+ b + \leftw</math>


for arbitrary vector {{tmath| w }}. Solution(s) exist if and only if <math>A A^+ b = b</math>.<ref name=James/> If the latter holds, then the solution is unique if and only if {{tmath| A }} has full column rank, in which case {{tmath| \left }} is a zero matrix. If solutions exist but {{tmath| A }} does not have full column rank, then we have an ], all of whose infinitude of solutions are given by this last equation. for arbitrary vector {{tmath| w }}. Solution(s) exist if and only if <math>A A^+ b = b</math>.<ref name=James/> If the latter holds, then the solution is unique if and only if {{tmath| A }} has full column rank, in which case {{tmath| I - A^+ A }} is a zero matrix. If solutions exist but {{tmath| A }} does not have full column rank, then we have an ], all of whose infinitude of solutions are given by this last equation.


===Minimum norm solution to a linear system=== ===Minimum norm solution to a linear system===
Line 319: Line 315:
* If <math>Ax = b</math> is satisfiable, the vector <math>z = A^+b</math> is a solution, and satisfies <math>\|z\|_2 \le \|x\|_2</math> for all solutions. * If <math>Ax = b</math> is satisfiable, the vector <math>z = A^+b</math> is a solution, and satisfies <math>\|z\|_2 \le \|x\|_2</math> for all solutions.


This result is easily extended to systems with multiple right-hand sides, when the Euclidean norm is replaced by the Frobenius norm. Let {{tmath| B \in \mathbb{k}^{m\times p} }}. This result is easily extended to systems with multiple right-hand sides, when the Euclidean norm is replaced by the Frobenius norm. Let {{tmath| B \in \mathbb{K}^{m\times p} }}.


* If <math>AX = B</math> is satisfiable, the matrix <math>Z = A^+B</math> is a solution, and satisfies <math>\|Z\|_{\mathrm{F}} \le \|X\|_{\mathrm{F}}</math> for all solutions. * If <math>AX = B</math> is satisfiable, the matrix <math>Z = A^+B</math> is a solution, and satisfies <math>\|Z\|_{\mathrm{F}} \le \|X\|_{\mathrm{F}}</math> for all solutions.
Line 325: Line 321:
===Condition number=== ===Condition number===
Using the pseudoinverse and a ], one can define a ] for any matrix: Using the pseudoinverse and a ], one can define a ] for any matrix:
<math display="block">\mbox{cond}(A) = \|A\| \left\|A^+\right\|. </math> <math display="block">\mbox{cond}(A) = \|A\| \left\|A^+\right\|.</math>


A large condition number implies that the problem of finding least-squares solutions to the corresponding system of linear equations is ill-conditioned in the sense that small errors in the entries of {{tmath| A }} can lead to huge errors in the entries of the solution.<ref name=hagen/> A large condition number implies that the problem of finding least-squares solutions to the corresponding system of linear equations is ill-conditioned in the sense that small errors in the entries of {{tmath| A }} can lead to huge errors in the entries of the solution.<ref name=hagen/>


==Generalizations== ==Generalizations==
The weighted pseudoinverse <ref>{{Cite journal|last=Price|first=Charles M.|date=1963-03-15|title=The Matrix Pseudoinverse and Minimal Variance Estimates|journal=SIAM Review|language=en|volume=6|issue=2|pages=115–120|doi=10.1137/1006029|issn=1095-7200}}</ref> generalizes the Moore-Penrose inverse between metric spaces with weight matrices in the domain and range. These weights are the identity for the standard Moore-Penrose inverse, which assumes an orthonormal basis in both spaces.
Besides for matrices over real and complex numbers, the conditions hold for matrices over ]s, also called "complex quaternions".<ref>{{cite arXiv |title=Matrix Theory over the Complex Quaternion Algebra | at=p.8, Theorem 3.5 |eprint=math/0004005 |ref=tian2000| last1=Tian | first1=Yongge | year=2000 }}</ref>


In order to solve more general least-squares problems, one can define Moore–Penrose inverses for all continuous linear operators {{tmath| A: H_1 \rarr H_2 }} between two ]s {{tmath| H_1 }} and {{tmath| H_2 }}, using the same four conditions as in our definition above. It turns out that not every continuous linear operator has a continuous linear pseudoinverse in this sense.<ref name="hagen">{{cite book|first1=Roland|last1=Hagen|first2=Steffen|last2=Roch|first3=Bernd|last3=Silbermann|title=C*-algebras and Numerical Analysis|publisher=CRC Press|year=2001|chapter=Section 2.1.2}}</ref> Those that do are precisely the ones whose range is ] in {{tmath| H_2 }}. In order to solve more general least-squares problems, one can define Moore–Penrose inverses for all continuous linear operators {{tmath| A: H_1 \rarr H_2 }} between two ]s {{tmath| H_1 }} and {{tmath| H_2 }}, using the same four conditions as in our definition above. It turns out that not every continuous linear operator has a continuous linear pseudoinverse in this sense.<ref name="hagen">{{cite book|first1=Roland|last1=Hagen|first2=Steffen|last2=Roch|first3=Bernd|last3=Silbermann|title=C*-algebras and Numerical Analysis|publisher=CRC Press|year=2001|chapter=Section 2.1.2}}</ref> Those that do are precisely the ones whose range is ] in {{tmath| H_2 }}.


A notion of pseudoinverse exists for matrices over an arbitrary ] equipped with an arbitrary ] ]. In this more general setting, a given matrix doesn't always have a pseudoinverse. The necessary and sufficient condition for a pseudoinverse to exist is that <math>\operatorname{rank}(A) = \operatorname{rank}\left(A^* A\right) = \operatorname{rank}\left(A A^*\right)</math> where <math>A^*</math> denotes the result of applying the involution operation to the transpose of <math>A</math>. When it does exist, it is unique.<ref>{{Cite journal|last=Pearl|first=Martin H.|date=1968-10-01|title=Generalized inverses of matrices with entries taken from an arbitrary field|url=https://dx.doi.org/10.1016%2F0024-3795%2868%2990028-1|journal=Linear Algebra and Its Applications|language=en|volume=1|issue=4|pages=571–587|doi=10.1016/0024-3795(68)90028-1|issn=0024-3795|doi-access=free}}</ref> '''Example''': Consider the field of complex numbers equipped with the ] (as opposed to the involution considered elsewhere in the article); do there exist matrices that fail to have pseudoinverses in this sense? Consider the matrix <math>A = \begin{bmatrix}1 & i\end{bmatrix}^\textsf{T}</math>. Observe that <math>\operatorname{rank}\left(A A^\textsf{T}\right) = 1</math> while <math>\operatorname{rank}\left(A^\textsf{T} A\right) = 0</math>. So this matrix doesn't have a pseudoinverse in this sense. A notion of pseudoinverse exists for matrices over an arbitrary ] equipped with an arbitrary ] ]. In this more general setting, a given matrix doesn't always have a pseudoinverse. The necessary and sufficient condition for a pseudoinverse to exist is that <math>\operatorname{rank}(A) = \operatorname{rank}\left(A^* A\right) = \operatorname{rank}\left(A A^*\right)</math>, where <math>A^*</math> denotes the result of applying the involution operation to the transpose of <math>A</math>. When it does exist, it is unique.<ref>{{Cite journal|last=Pearl|first=Martin H.|date=1968-10-01|title=Generalized inverses of matrices with entries taken from an arbitrary field|journal=Linear Algebra and Its Applications|language=en|volume=1|issue=4|pages=571–587|doi=10.1016/0024-3795(68)90028-1|issn=0024-3795|doi-access=free}}</ref> '''Example''': Consider the field of complex numbers equipped with the ] (as opposed to the involution considered elsewhere in the article); do there exist matrices that fail to have pseudoinverses in this sense? Consider the matrix <math>A = \begin{bmatrix}1 & i\end{bmatrix}^\operatorname{T}</math>. Observe that <math>\operatorname{rank}\left(A A^\operatorname{T}\right) = 1</math> while <math>\operatorname{rank}\left(A^\operatorname{T} A\right) = 0</math>. So this matrix doesn't have a pseudoinverse in this sense.


In ], a Moore–Penrose inverse may be defined on a ]. This abstract definition coincides with the one in linear algebra. In ], a Moore–Penrose inverse may be defined on a ]. This abstract definition coincides with the one in linear algebra.


==See also== ==See also==
* ]
* ] * ]
* ] * ]
Line 353: Line 348:
* {{cite book| first1 = Adi | last1 = Ben-Israel |author1-link=Adi Ben-Israel| first2 = Thomas N.E. | last2 = Greville |author2-link=Thomas N. E. Greville|title=Generalized inverses: Theory and applications| * {{cite book| first1 = Adi | last1 = Ben-Israel |author1-link=Adi Ben-Israel| first2 = Thomas N.E. | last2 = Greville |author2-link=Thomas N. E. Greville|title=Generalized inverses: Theory and applications|
edition= 2nd |location= New York, NY | publisher = Springer |year=2003| isbn = 978-0-387-00293-4 | doi = 10.1007/b97366 }} edition= 2nd |location= New York, NY | publisher = Springer |year=2003| isbn = 978-0-387-00293-4 | doi = 10.1007/b97366 }}
* {{cite book| first1 = S. L. | last1 = Campbell | first2 = C. D. | last2 = Meyer, Jr. | title= Generalized Inverses of Linear Transformations| url = https://archive.org/details/generalizedinver0000camp | url-access = registration | * {{cite book| first1 = S. L. | last1 = Campbell | first2 = C. D. Jr.| last2 = Meyer | title= Generalized Inverses of Linear Transformations| url = https://archive.org/details/generalizedinver0000camp | url-access = registration |
publisher=Dover |year=1991 | isbn = 978-0-486-66693-8 }} publisher=Dover |year=1991 | isbn = 978-0-486-66693-8 }}
* {{cite book| first = Yoshihiko | last = Nakamura | title= Advanced Robotics: Redundancy and Optimization| * {{cite book| first = Yoshihiko | last = Nakamura | title= Advanced Robotics: Redundancy and Optimization|
Line 360: Line 355:


==External links== ==External links==
* * {{PlanetMath |urlname=Pseudoinverse |title=Pseudoinverse}}
* *
* {{planetmath reference|id=6067|title=Moore–Penrose inverse}} * {{PlanetMath |urlname=MoorePenroseGeneralizedInverse |title=Moore–Penrose generalized inverse}}
* {{MathWorld|urlname=Pseudoinverse|title=Pseudoinverse}} * {{MathWorld|urlname=Pseudoinverse|title=Pseudoinverse}}
* {{MathWorld|urlname=Moore-PenroseMatrixInverse|title=Moore–Penrose Inverse}} * {{MathWorld|urlname=Moore-PenroseMatrixInverse|title=Moore–Penrose Inverse}}
* *
* *


{{Numerical linear algebra}} {{Numerical linear algebra}}
{{Roger Penrose}} {{Roger Penrose}}


{{DEFAULTSORT:Moore-Penrose Pseudoinverse}} {{DEFAULTSORT:Moore-Penrose inverse}}
] ]
] ]

Latest revision as of 17:11, 26 December 2024

Most widely known generalized inverse of a matrix

In mathematics, and in particular linear algebra, the Moore–Penrose inverse A + {\displaystyle A^{+}} ⁠ of a matrix A {\displaystyle A} ⁠, often called the pseudoinverse, is the most widely known generalization of the inverse matrix. It was independently described by E. H. Moore in 1920, Arne Bjerhammar in 1951, and Roger Penrose in 1955. Earlier, Erik Ivar Fredholm had introduced the concept of a pseudoinverse of integral operators in 1903. The terms pseudoinverse and generalized inverse are sometimes used as synonyms for the Moore–Penrose inverse of a matrix, but sometimes applied to other elements of algebraic structures which share some but not all properties expected for an inverse element.

A common use of the pseudoinverse is to compute a "best fit" (least squares) approximate solution to a system of linear equations that lacks an exact solution (see below under § Applications). Another use is to find the minimum (Euclidean) norm solution to a system of linear equations with multiple solutions. The pseudoinverse facilitates the statement and proof of results in linear algebra.

The pseudoinverse is defined for all rectangular matrices whose entries are real or complex numbers. Given a rectangular matrix with real or complex entries, its pseudoinverse is unique. It can be computed using the singular value decomposition. In the special case where ⁠ A {\displaystyle A} ⁠ is a normal matrix (for example, a Hermitian matrix), the pseudoinverse ⁠ A + {\displaystyle A^{+}} annihilates the kernel of ⁠ A {\displaystyle A} ⁠ and acts as a traditional inverse of ⁠ A {\displaystyle A} ⁠ on the subspace orthogonal to the kernel.

Notation

In the following discussion, the following conventions are adopted.

  • K {\displaystyle \mathbb {K} } ⁠ will denote one of the fields of real or complex numbers, denoted ⁠ R {\displaystyle \mathbb {R} } ⁠, ⁠ C {\displaystyle \mathbb {C} } ⁠, respectively. The vector space of ⁠ m × n {\displaystyle m\times n} ⁠ matrices over ⁠ K {\displaystyle \mathbb {K} } ⁠ is denoted by ⁠ K m × n {\displaystyle \mathbb {K} ^{m\times n}} ⁠.
  • For ⁠ A K m × n {\displaystyle A\in \mathbb {K} ^{m\times n}} ⁠, the transpose is denoted ⁠ A T {\displaystyle A^{\operatorname {T} }} ⁠ and the Hermitian transpose (also called conjugate transpose) is denoted ⁠ A {\displaystyle A^{*}} ⁠. If K = R {\displaystyle \mathbb {K} =\mathbb {R} } , then A = A T {\displaystyle A^{*}=A^{\operatorname {T} }} .
  • For ⁠ A K m × n {\displaystyle A\in \mathbb {K} ^{m\times n}} ⁠, ⁠ ran ( A ) {\displaystyle \operatorname {ran} (A)} ⁠ (standing for "range") denotes the column space (image) of ⁠ A {\displaystyle A} ⁠ (the space spanned by the column vectors of ⁠ A {\displaystyle A} ⁠) and ⁠ ker ( A ) {\displaystyle \ker(A)} ⁠ denotes the kernel (null space) of ⁠ A {\displaystyle A} ⁠.
  • For any positive integer ⁠ n {\displaystyle n} ⁠, the ⁠ n × n {\displaystyle n\times n} identity matrix is denoted ⁠ I n K n × n {\displaystyle I_{n}\in \mathbb {K} ^{n\times n}} ⁠.

Definition

For A K m × n {\displaystyle A\in \mathbb {K} ^{m\times n}} , a pseudoinverse of A is defined as a matrix ⁠ A + K n × m {\displaystyle A^{+}\in \mathbb {K} ^{n\times m}} ⁠ satisfying all of the following four criteria, known as the Moore–Penrose conditions:

  1. A A + {\displaystyle AA^{+}} ⁠ need not be the general identity matrix, but it maps all column vectors of A to themselves: A A + A = A . {\displaystyle AA^{+}A=\;A.}
  2. A + {\displaystyle A^{+}} ⁠ acts like a weak inverse: A + A A + = A + . {\displaystyle A^{+}AA^{+}=\;A^{+}.}
  3. A A + {\displaystyle AA^{+}} ⁠ is Hermitian: ( A A + ) = A A + . {\displaystyle \left(AA^{+}\right)^{*}=\;AA^{+}.}
  4. A + A {\displaystyle A^{+}A} ⁠ is also Hermitian: ( A + A ) = A + A . {\displaystyle \left(A^{+}A\right)^{*}=\;A^{+}A.}

Note that A + A {\displaystyle A^{+}A} and A A + {\displaystyle AA^{+}} are idempotent operators, as follows from ( A A + ) 2 = A A + {\displaystyle (AA^{+})^{2}=AA^{+}} and ( A + A ) 2 = A + A {\displaystyle (A^{+}A)^{2}=A^{+}A} . More specifically, A + A {\displaystyle A^{+}A} projects onto the image of A T {\displaystyle A^{T}} (equivalently, the span of the rows of A {\displaystyle A} ), and A A + {\displaystyle AA^{+}} projects onto the image of A {\displaystyle A} (equivalently, the span of the columns of A {\displaystyle A} ). In fact, the above four conditions are fully equivalent to A + A {\displaystyle A^{+}A} and A A + {\displaystyle AA^{+}} being such orthogonal projections: A A + {\displaystyle AA^{+}} projecting onto the image of A {\displaystyle A} implies ( A A + ) A = A {\displaystyle (AA^{+})A=A} , and A + A {\displaystyle A^{+}A} projecting onto the image of A T {\displaystyle A^{T}} implies ( A + A ) A T = A T {\displaystyle (A^{+}A)A^{T}=A^{T}} .

The pseudoinverse A + {\displaystyle A^{+}} exists for any matrix A K m × n {\displaystyle A\in \mathbb {K} ^{m\times n}} . If furthermore A {\displaystyle A} is full rank, that is, its rank is ⁠ min { m , n } {\displaystyle \min\{m,n\}} ⁠, then ⁠ A + {\displaystyle A^{+}} ⁠ can be given a particularly simple algebraic expression. In particular:

  • When ⁠ A {\displaystyle A} ⁠ has linearly independent columns (equivalently, ⁠ A {\displaystyle A} ⁠ is injective, and thus ⁠ A A {\displaystyle A^{*}A} ⁠ is invertible), ⁠ A + {\displaystyle A^{+}} ⁠ can be computed as A + = ( A A ) 1 A . {\displaystyle A^{+}=\left(A^{*}A\right)^{-1}A^{*}.} This particular pseudoinverse is a left inverse, that is, A + A = I {\displaystyle A^{+}A=I} .
  • If, on the other hand, A {\displaystyle A} has linearly independent rows (equivalently, A {\displaystyle A} is surjective, and thus ⁠ A A {\displaystyle AA^{*}} ⁠ is invertible), ⁠ A + {\displaystyle A^{+}} ⁠ can be computed as A + = A ( A A ) 1 . {\displaystyle A^{+}=A^{*}\left(AA^{*}\right)^{-1}.} This is a right inverse, as A A + = I {\displaystyle AA^{+}=I} .

In the more general case, the pseudoinverse can be expressed leveraging the singular value decomposition. Any matrix can be decomposed as A = U D V {\displaystyle A=UDV^{*}} for some isometries U , V {\displaystyle U,V} and diagonal nonnegative real matrix D {\displaystyle D} . The pseudoinverse can then be written as A + = V D + U {\displaystyle A^{+}=VD^{+}U^{*}} , where D + {\displaystyle D^{+}} is the pseudoinverse of D {\displaystyle D} and can be obtained by transposing the matrix and replacing the nonzero values with their multiplicative inverses. That this matrix satisfies the above requirement is directly verified observing that A A + = U U {\displaystyle AA^{+}=UU^{*}} and A + A = V V {\displaystyle A^{+}A=VV^{*}} , which are the projections onto image and support of A {\displaystyle A} , respectively.

Properties

Existence and uniqueness

As discussed above, for any matrix ⁠ A {\displaystyle A} ⁠ there is one and only one pseudoinverse ⁠ A + {\displaystyle A^{+}} ⁠.

A matrix satisfying only the first of the conditions given above, namely A A + A = A {\textstyle AA^{+}A=A} , is known as a generalized inverse. If the matrix also satisfies the second condition, namely A + A A + = A + {\textstyle A^{+}AA^{+}=A^{+}} , it is called a generalized reflexive inverse. Generalized inverses always exist but are not in general unique. Uniqueness is a consequence of the last two conditions.

Basic properties

Proofs for the properties below can be found at b:Topics in Abstract Algebra/Linear algebra.

  • If ⁠ A {\displaystyle A} ⁠ has real entries, then so does ⁠ A + {\displaystyle A^{+}} ⁠.
  • If ⁠ A {\displaystyle A} ⁠ is invertible, its pseudoinverse is its inverse. That is, A + = A 1 {\displaystyle A^{+}=A^{-1}} .
  • The pseudoinverse of the pseudoinverse is the original matrix: ( A + ) + = A {\displaystyle \left(A^{+}\right)^{+}=A} .
  • Pseudoinversion commutes with transposition, complex conjugation, and taking the conjugate transpose: ( A T ) + = ( A + ) T , ( A ¯ ) + = A + ¯ , ( A ) + = ( A + ) . {\displaystyle \left(A^{\operatorname {T} }\right)^{+}=\left(A^{+}\right)^{\operatorname {T} },\quad \left({\overline {A}}\right)^{+}={\overline {A^{+}}},\quad \left(A^{*}\right)^{+}=\left(A^{+}\right)^{*}.}
  • The pseudoinverse of a scalar multiple of ⁠ A {\displaystyle A} ⁠ is the reciprocal multiple of ⁠ A + {\displaystyle A^{+}} ⁠: ( α A ) + = α 1 A + {\displaystyle \left(\alpha A\right)^{+}=\alpha ^{-1}A^{+}} for ⁠ α 0 {\displaystyle \alpha \neq 0} ⁠; otherwise, ( 0 A ) + = 0 A + = 0 A T {\displaystyle \left(0A\right)^{+}=0A^{+}=0A^{\operatorname {T} }} , or 0 + = 0 T {\displaystyle 0^{+}=0^{\operatorname {T} }} .
  • The kernel and image of the pseudoinverse coincide with those of the conjugate transpose: ker ( A + ) = ker ( A ) {\displaystyle \ker \left(A^{+}\right)=\ker \left(A^{*}\right)} and ran ( A + ) = ran ( A ) {\displaystyle \operatorname {ran} \left(A^{+}\right)=\operatorname {ran} \left(A^{*}\right)} .

Identities

The following identity formula can be used to cancel or expand certain subexpressions involving pseudoinverses: A = A A A + = A + A A . {\displaystyle A={}A{}A^{*}{}A^{+*}{}={}A^{+*}{}A^{*}{}A.} Equivalently, substituting A + {\displaystyle A^{+}} for A {\displaystyle A} gives A + = A + A + A = A A + A + , {\displaystyle A^{+}={}A^{+}{}A^{+*}{}A^{*}{}={}A^{*}{}A^{+*}{}A^{+},} while substituting A {\displaystyle A^{*}} for A {\displaystyle A} gives A = A A A + = A + A A . {\displaystyle A^{*}={}A^{*}{}A{}A^{+}{}={}A^{+}{}A{}A^{*}.}

Reduction to Hermitian case

The computation of the pseudoinverse is reducible to its construction in the Hermitian case. This is possible through the equivalences: A + = ( A A ) + A , {\displaystyle A^{+}=\left(A^{*}A\right)^{+}A^{*},} A + = A ( A A ) + , {\displaystyle A^{+}=A^{*}\left(AA^{*}\right)^{+},}

as ⁠ A A {\displaystyle A^{*}A} ⁠ and ⁠ A A {\displaystyle AA^{*}} ⁠ are Hermitian.

Pseudoinverse of products

The equality ⁠ ( A B ) + = B + A + {\displaystyle (AB)^{+}=B^{+}A^{+}} ⁠ does not hold in general. Rather, suppose ⁠ A K m × n ,   B K n × p {\displaystyle A\in \mathbb {K} ^{m\times n},\ B\in \mathbb {K} ^{n\times p}} ⁠. Then the following are equivalent:

  1. ( A B ) + = B + A + {\textstyle (AB)^{+}=B^{+}A^{+}}
  2. A + A B B A = B B A {\displaystyle A^{+}ABB^{*}A^{*}=BB^{*}A^{*}} and B B + A A B = A A B {\displaystyle BB^{+}A^{*}AB=A^{*}AB}
  3. ( A + A B B ) = A + A B B {\textstyle \left(A^{+}ABB^{*}\right)^{*}=A^{+}ABB^{*}} and ( A A B B + ) = A A B B + {\displaystyle \left(A^{*}ABB^{+}\right)^{*}=A^{*}ABB^{+}}
  4. A + A B B A A B B + = B B A A {\textstyle A^{+}ABB^{*}A^{*}ABB^{+}=BB^{*}A^{*}A}
  5. A + A B = B ( A B ) + A B {\textstyle A^{+}AB=B(AB)^{+}AB} and B B + A = A A B ( A B ) + {\displaystyle BB^{+}A^{*}=A^{*}AB(AB)^{+}} .

The following are sufficient conditions for ⁠ ( A B ) + = B + A + {\displaystyle (AB)^{+}=B^{+}A^{+}} ⁠:

  1. A {\displaystyle A} ⁠ has orthonormal columns (then A A = A + A = I n {\displaystyle A^{*}A=A^{+}A=I_{n}} ), or
  2. B {\displaystyle B} ⁠ has orthonormal rows (then B B = B B + = I n {\displaystyle BB^{*}=BB^{+}=I_{n}} ), or
  3. A {\displaystyle A} ⁠ has linearly independent columns (then A + A = I {\displaystyle A^{+}A=I} ) and ⁠ B {\displaystyle B} ⁠ has linearly independent rows (then B B + = I {\displaystyle BB^{+}=I} ),   or
  4. B = A {\displaystyle B=A^{*}} , or
  5. B = A + {\displaystyle B=A^{+}} .

The following is a necessary condition for ⁠ ( A B ) + = B + A + {\displaystyle (AB)^{+}=B^{+}A^{+}} ⁠:

  1. ( A + A ) ( B B + ) = ( B B + ) ( A + A ) {\displaystyle (A^{+}A)(BB^{+})=(BB^{+})(A^{+}A)}

The fourth sufficient condition yields the equalities ( A A ) + = A + A + , ( A A ) + = A + A + . {\displaystyle {\begin{aligned}\left(AA^{*}\right)^{+}&=A^{+*}A^{+},\\\left(A^{*}A\right)^{+}&=A^{+}A^{+*}.\end{aligned}}}

Here is a counterexample where ⁠ ( A B ) + B + A + {\displaystyle (AB)^{+}\neq B^{+}A^{+}} ⁠:

( ( 1 1 0 0 ) ( 0 0 1 1 ) ) + = ( 1 1 0 0 ) + = ( 1 2 0 1 2 0 ) ( 1 4 0 1 4 0 ) = ( 0 1 2 0 1 2 ) ( 1 2 0 1 2 0 ) = ( 0 0 1 1 ) + ( 1 1 0 0 ) + {\displaystyle {\Biggl (}{\begin{pmatrix}1&1\\0&0\end{pmatrix}}{\begin{pmatrix}0&0\\1&1\end{pmatrix}}{\Biggr )}^{+}={\begin{pmatrix}1&1\\0&0\end{pmatrix}}^{+}={\begin{pmatrix}{\tfrac {1}{2}}&0\\{\tfrac {1}{2}}&0\end{pmatrix}}\quad \neq \quad {\begin{pmatrix}{\tfrac {1}{4}}&0\\{\tfrac {1}{4}}&0\end{pmatrix}}={\begin{pmatrix}0&{\tfrac {1}{2}}\\0&{\tfrac {1}{2}}\end{pmatrix}}{\begin{pmatrix}{\tfrac {1}{2}}&0\\{\tfrac {1}{2}}&0\end{pmatrix}}={\begin{pmatrix}0&0\\1&1\end{pmatrix}}^{+}{\begin{pmatrix}1&1\\0&0\end{pmatrix}}^{+}}

Projectors

P = A A + {\displaystyle P=AA^{+}} and Q = A + A {\displaystyle Q=A^{+}A} are orthogonal projection operators, that is, they are Hermitian ( P = P {\displaystyle P=P^{*}} , Q = Q {\displaystyle Q=Q^{*}} ) and idempotent ( P 2 = P {\displaystyle P^{2}=P} and Q 2 = Q {\displaystyle Q^{2}=Q} ). The following hold:

  • P A = A Q = A {\displaystyle PA=AQ=A} and A + P = Q A + = A + {\displaystyle A^{+}P=QA^{+}=A^{+}}
  • P {\displaystyle P} ⁠ is the orthogonal projector onto the range of ⁠ A {\displaystyle A} ⁠ (which equals the orthogonal complement of the kernel of ⁠ A {\displaystyle A^{*}} ⁠).
  • Q {\displaystyle Q} ⁠ is the orthogonal projector onto the range of ⁠ A {\displaystyle A^{*}} ⁠ (which equals the orthogonal complement of the kernel of ⁠ A {\displaystyle A} ⁠).
  • I Q = I A + A {\displaystyle I-Q=I-A^{+}A} is the orthogonal projector onto the kernel of ⁠ A {\displaystyle A} ⁠.
  • I P = I A A + {\displaystyle I-P=I-AA^{+}} is the orthogonal projector onto the kernel of ⁠ A {\displaystyle A^{*}} ⁠.

The last two properties imply the following identities:

  • A   ( I A + A ) = ( I A A + ) A     = 0 {\displaystyle A\,\ \left(I-A^{+}A\right)=\left(I-AA^{+}\right)A\ \ =0}
  • A ( I A A + ) = ( I A + A ) A = 0 {\displaystyle A^{*}\left(I-AA^{+}\right)=\left(I-A^{+}A\right)A^{*}=0}

Another property is the following: if ⁠ A K n × n {\displaystyle A\in \mathbb {K} ^{n\times n}} ⁠ is Hermitian and idempotent (true if and only if it represents an orthogonal projection), then, for any matrix ⁠ B K m × n {\displaystyle B\in \mathbb {K} ^{m\times n}} ⁠ the following equation holds: A ( B A ) + = ( B A ) + {\displaystyle A(BA)^{+}=(BA)^{+}}

This can be proven by defining matrices C = B A {\displaystyle C=BA} , D = A ( B A ) + {\displaystyle D=A(BA)^{+}} , and checking that ⁠ D {\displaystyle D} ⁠ is indeed a pseudoinverse for ⁠ C {\displaystyle C} ⁠ by verifying that the defining properties of the pseudoinverse hold, when ⁠ A {\displaystyle A} ⁠ is Hermitian and idempotent.

From the last property it follows that, if ⁠ A K n × n {\displaystyle A\in \mathbb {K} ^{n\times n}} ⁠ is Hermitian and idempotent, for any matrix ⁠ B K n × m {\displaystyle B\in \mathbb {K} ^{n\times m}} ( A B ) + A = ( A B ) + {\displaystyle (AB)^{+}A=(AB)^{+}}

Finally, if ⁠ A {\displaystyle A} ⁠ is an orthogonal projection matrix, then its pseudoinverse trivially coincides with the matrix itself, that is, A + = A {\displaystyle A^{+}=A} .

Geometric construction

If we view the matrix as a linear map ⁠ A : K n K m {\displaystyle A:\mathbb {K} ^{n}\to \mathbb {K} ^{m}} ⁠ over the field ⁠ K {\displaystyle \mathbb {K} } ⁠ then ⁠ A + : K m K n {\displaystyle A^{+}:\mathbb {K} ^{m}\to \mathbb {K} ^{n}} ⁠ can be decomposed as follows. We write ⁠ {\displaystyle \oplus } ⁠ for the direct sum, ⁠ {\displaystyle \perp } ⁠ for the orthogonal complement, ⁠ ker {\displaystyle \ker } ⁠ for the kernel of a map, and ⁠ ran {\displaystyle \operatorname {ran} } ⁠ for the image of a map. Notice that K n = ( ker A ) ker A {\displaystyle \mathbb {K} ^{n}=\left(\ker A\right)^{\perp }\oplus \ker A} and K m = ran A ( ran A ) {\displaystyle \mathbb {K} ^{m}=\operatorname {ran} A\oplus \left(\operatorname {ran} A\right)^{\perp }} . The restriction A : ( ker A ) ran A {\displaystyle A:\left(\ker A\right)^{\perp }\to \operatorname {ran} A} is then an isomorphism. This implies that ⁠ A + {\displaystyle A^{+}} ⁠ on ⁠ ran A {\displaystyle \operatorname {ran} A} ⁠ is the inverse of this isomorphism, and is zero on ( ran A ) . {\displaystyle \left(\operatorname {ran} A\right)^{\perp }.}

In other words: To find ⁠ A + b {\displaystyle A^{+}b} ⁠ for given ⁠ b {\displaystyle b} ⁠ in ⁠ K m {\displaystyle \mathbb {K} ^{m}} ⁠, first project ⁠ b {\displaystyle b} ⁠ orthogonally onto the range of ⁠ A {\displaystyle A} ⁠, finding a point ⁠ p ( b ) {\displaystyle p(b)} ⁠ in the range. Then form ⁠ A 1 ( { p ( b ) } ) {\displaystyle A^{-1}(\{p(b)\})} ⁠, that is, find those vectors in ⁠ K n {\displaystyle \mathbb {K} ^{n}} ⁠ that ⁠ A {\displaystyle A} ⁠ sends to ⁠ p ( b ) {\displaystyle p(b)} ⁠. This will be an affine subspace of ⁠ K n {\displaystyle \mathbb {K} ^{n}} ⁠ parallel to the kernel of ⁠ A {\displaystyle A} ⁠. The element of this subspace that has the smallest length (that is, is closest to the origin) is the answer ⁠ A + b {\displaystyle A^{+}b} ⁠ we are looking for. It can be found by taking an arbitrary member of ⁠ A 1 ( { p ( b ) } ) {\displaystyle A^{-1}(\{p(b)\})} ⁠ and projecting it orthogonally onto the orthogonal complement of the kernel of ⁠ A {\displaystyle A} ⁠.

This description is closely related to the minimum-norm solution to a linear system.

Limit relations

The pseudoinverse are limits: A + = lim δ 0 ( A A + δ I ) 1 A = lim δ 0 A ( A A + δ I ) 1 {\displaystyle A^{+}=\lim _{\delta \searrow 0}\left(A^{*}A+\delta I\right)^{-1}A^{*}=\lim _{\delta \searrow 0}A^{*}\left(AA^{*}+\delta I\right)^{-1}} (see Tikhonov regularization). These limits exist even if ⁠ ( A A ) 1 {\displaystyle \left(AA^{*}\right)^{-1}} ⁠ or ⁠ ( A A ) 1 {\displaystyle \left(A^{*}A\right)^{-1}} ⁠ do not exist.

Continuity

In contrast to ordinary matrix inversion, the process of taking pseudoinverses is not continuous: if the sequence ⁠ ( A n ) {\displaystyle \left(A_{n}\right)} ⁠ converges to the matrix ⁠ A {\displaystyle A} ⁠ (in the maximum norm or Frobenius norm, say), then ⁠ ( A n ) + {\displaystyle (A_{n})^{+}} ⁠ need not converge to ⁠ A + {\displaystyle A^{+}} ⁠. However, if all the matrices ⁠ A n {\displaystyle A_{n}} ⁠ have the same rank as ⁠ A {\displaystyle A} ⁠, ⁠ ( A n ) + {\displaystyle (A_{n})^{+}} ⁠ will converge to ⁠ A + {\displaystyle A^{+}} ⁠.

Derivative

Let x A ( x ) {\displaystyle x\mapsto A(x)} be a real-valued differentiable matrix function with constant rank at a point ⁠ x 0 {\displaystyle x_{0}} ⁠. The derivative of x A + ( x ) {\displaystyle x\mapsto A^{+}(x)} at x 0 {\displaystyle x_{0}} may be calculated in terms of the derivative of A {\displaystyle A} at x 0 {\displaystyle x_{0}} : d d x | x = x 0 A + = A + ( d A d x ) A +   +   A + A + ( d A d x ) ( I A A + )   +   ( I A + A ) ( d A d x ) A + A + , {\displaystyle \left.{\frac {\mathrm {d} }{\mathrm {d} x}}\right|_{x=x_{0}\!\!\!\!\!\!\!}A^{+}=-A^{+}\left({\frac {\mathrm {d} A}{\mathrm {d} x}}\right)A^{+}~+~A^{+}A^{+\top }\left({\frac {\mathrm {d} A^{\top }}{\mathrm {d} x}}\right)\left(I-AA^{+}\right)~+~\left(I-A^{+}A\right)\left({\frac {\mathrm {d} A^{\top }}{\mathrm {d} x}}\right)A^{+\top }A^{+},} where the functions A {\displaystyle A} , A + {\displaystyle A^{+}} and derivatives on the right side are evaluated at x 0 {\displaystyle x_{0}} (that is, A := A ( x 0 ) {\displaystyle A:=A(x_{0})} , A + := A + ( x 0 ) {\displaystyle A^{+}:=A^{+}(x_{0})} , etc.). For a complex matrix, the transpose is replaced with the conjugate transpose. For a real-valued symmetric matrix, the Magnus-Neudecker derivative is established.

Examples

Since for invertible matrices the pseudoinverse equals the usual inverse, only examples of non-invertible matrices are considered below.

  • For A = ( 0 0 0 0 ) , {\displaystyle A={\begin{pmatrix}0&0\\0&0\end{pmatrix}},} the pseudoinverse is A + = ( 0 0 0 0 ) . {\displaystyle A^{+}={\begin{pmatrix}0&0\\0&0\end{pmatrix}}.} The uniqueness of this pseudoinverse can be seen from the requirement A + = A + A A + {\displaystyle A^{+}=A^{+}AA^{+}} , since multiplication by a zero matrix would always produce a zero matrix.
  • For A = ( 1 0 1 0 ) , {\displaystyle A={\begin{pmatrix}1&0\\1&0\end{pmatrix}},} the pseudoinverse is A + = ( 1 2 1 2 0 0 ) {\displaystyle A^{+}={\begin{pmatrix}{\frac {1}{2}}&{\frac {1}{2}}\\0&0\end{pmatrix}}} .
Indeed, A A + = ( 1 2 1 2 1 2 1 2 ) {\displaystyle A\,A^{+}={\begin{pmatrix}{\frac {1}{2}}&{\frac {1}{2}}\\{\frac {1}{2}}&{\frac {1}{2}}\end{pmatrix}}} , and thus A A + A = ( 1 0 1 0 ) = A {\displaystyle A\,A^{+}A={\begin{pmatrix}1&0\\1&0\end{pmatrix}}=A} . Similarly, A + A = ( 1 0 0 0 ) {\displaystyle A^{+}A={\begin{pmatrix}1&0\\0&0\end{pmatrix}}} , and thus A + A A + = ( 1 2 1 2 0 0 ) = A + {\displaystyle A^{+}A\,A^{+}={\begin{pmatrix}{\frac {1}{2}}&{\frac {1}{2}}\\0&0\end{pmatrix}}=A^{+}} .
Note that ⁠ A {\displaystyle A} ⁠ is neither injective nor surjective, and thus the pseudoinverse cannot be computed via A + = ( A A ) 1 A {\displaystyle A^{+}=\left(A^{*}A\right)^{-1}A^{*}} nor A + = A ( A A ) 1 {\displaystyle A^{+}=A^{*}\left(AA^{*}\right)^{-1}} , as A A {\displaystyle A^{*}A} and A A {\displaystyle AA^{*}} are both singular, and furthermore A + {\displaystyle A^{+}} is neither a left nor a right inverse.
Nonetheless, the pseudoinverse can be computed via SVD observing that A = 2 ( e 1 + e 2 2 ) e 1 {\displaystyle A={\sqrt {2}}\left({\frac {\mathbf {e} _{1}+\mathbf {e} _{2}}{\sqrt {2}}}\right)\mathbf {e} _{1}^{*}} , and thus A + = 1 2 e 1 ( e 1 + e 2 2 ) {\displaystyle A^{+}={\frac {1}{\sqrt {2}}}\,\mathbf {e} _{1}\left({\frac {\mathbf {e} _{1}+\mathbf {e} _{2}}{\sqrt {2}}}\right)^{*}} .
  • For A = ( 1 0 1 0 ) , {\displaystyle A={\begin{pmatrix}1&0\\-1&0\end{pmatrix}},} A + = ( 1 2 1 2 0 0 ) . {\displaystyle A^{+}={\begin{pmatrix}{\frac {1}{2}}&-{\frac {1}{2}}\\0&0\end{pmatrix}}.}
  • For A = ( 1 0 2 0 ) , {\displaystyle A={\begin{pmatrix}1&0\\2&0\end{pmatrix}},} A + = ( 1 5 2 5 0 0 ) {\displaystyle A^{+}={\begin{pmatrix}{\frac {1}{5}}&{\frac {2}{5}}\\0&0\end{pmatrix}}} . The denominators are here 5 = 1 2 + 2 2 {\displaystyle 5=1^{2}+2^{2}} .
  • For A = ( 1 1 1 1 ) , {\displaystyle A={\begin{pmatrix}1&1\\1&1\end{pmatrix}},} A + = ( 1 4 1 4 1 4 1 4 ) . {\displaystyle A^{+}={\begin{pmatrix}{\frac {1}{4}}&{\frac {1}{4}}\\{\frac {1}{4}}&{\frac {1}{4}}\end{pmatrix}}.}
  • For A = ( 1 0 0 1 0 1 ) , {\displaystyle A={\begin{pmatrix}1&0\\0&1\\0&1\end{pmatrix}},} the pseudoinverse is A + = ( 1 0 0 0 1 2 1 2 ) {\displaystyle A^{+}={\begin{pmatrix}1&0&0\\0&{\frac {1}{2}}&{\frac {1}{2}}\end{pmatrix}}} .
For this matrix, the left inverse exists and thus equals A + {\displaystyle A^{+}} , indeed, A + A = ( 1 0 0 1 ) . {\displaystyle A^{+}A={\begin{pmatrix}1&0\\0&1\end{pmatrix}}.}


Special cases

Scalars

It is also possible to define a pseudoinverse for scalars and vectors. This amounts to treating these as matrices. The pseudoinverse of a scalar ⁠ x {\displaystyle x} ⁠ is zero if ⁠ x {\displaystyle x} ⁠ is zero and the reciprocal of ⁠ x {\displaystyle x} ⁠ otherwise: x + = { 0 , if  x = 0 ; x 1 , otherwise . {\displaystyle x^{+}={\begin{cases}0,&{\mbox{if }}x=0;\\x^{-1},&{\mbox{otherwise}}.\end{cases}}}

Vectors

The pseudoinverse of the null (all zero) vector is the transposed null vector. The pseudoinverse of a non-null vector is the conjugate transposed vector divided by its squared magnitude:

x + = { 0 T , if  x = 0 ; x / ( x x ) , otherwise . {\displaystyle {\vec {x}}^{+}={\begin{cases}{\vec {0}}^{\operatorname {T} },&{\text{if }}{\vec {x}}={\vec {0}};\\{\vec {x}}^{*}/({\vec {x}}^{*}{\vec {x}}),&{\text{otherwise}}.\end{cases}}}

Diagonal matrices

The pseudoinverse of a squared diagonal matrix is obtained by taking the reciprocal of the nonzero diagonal elements. Formally, if D {\displaystyle D} is a squared diagonal matrix with D = D ~ 0 k × k {\displaystyle D={\tilde {D}}\oplus \mathbf {0} _{k\times k}} and D ~ > 0 {\displaystyle {\tilde {D}}>0} , then D + = D ~ 1 0 k × k {\displaystyle D^{+}={\tilde {D}}^{-1}\oplus \mathbf {0} _{k\times k}} . More generally, if A {\displaystyle A} is any m × n {\displaystyle m\times n} rectangular matrix whose only nonzero elements are on the diagonal, meaning A i j = δ i j a i {\displaystyle A_{ij}=\delta _{ij}a_{i}} , a i K {\displaystyle a_{i}\in \mathbb {K} } , then A + {\displaystyle A^{+}} is a n × m {\displaystyle n\times m} rectangular matrix whose diagonal elements are the reciprocal of the original ones, that is, A i i 0 A i i + = 1 / A i i {\displaystyle A_{ii}\neq 0\implies A_{ii}^{+}=1/A_{ii}} .

Linearly independent columns

If the rank of ⁠ A {\displaystyle A} ⁠ is identical to the number of columns, ⁠ n {\displaystyle n} ⁠, (for ⁠ n m {\displaystyle n\leq m} ⁠,) there are ⁠ n {\displaystyle n} linearly independent columns, and ⁠ A A {\displaystyle A^{*}A} ⁠ is invertible. In this case, an explicit formula is: A + = ( A A ) 1 A . {\displaystyle A^{+}=\left(A^{*}A\right)^{-1}A^{*}.}

It follows that ⁠ A + {\displaystyle A^{+}} ⁠ is then a left inverse of ⁠ A {\displaystyle A} ⁠:   A + A = I n {\displaystyle A^{+}A=I_{n}} .

Linearly independent rows

If the rank of ⁠ A {\displaystyle A} ⁠ is identical to the number of rows, ⁠ m {\displaystyle m} ⁠, (for ⁠ m n {\displaystyle m\leq n} ⁠,) there are ⁠ m {\displaystyle m} linearly independent rows, and ⁠ A A {\displaystyle AA^{*}} ⁠ is invertible. In this case, an explicit formula is: A + = A ( A A ) 1 . {\displaystyle A^{+}=A^{*}\left(AA^{*}\right)^{-1}.}

It follows that ⁠ A + {\displaystyle A^{+}} ⁠ is a right inverse of ⁠ A {\displaystyle A} ⁠:   A A + = I m {\displaystyle AA^{+}=I_{m}} .

Orthonormal columns or rows

This is a special case of either full column rank or full row rank (treated above). If ⁠ A {\displaystyle A} ⁠ has orthonormal columns ( A A = I n {\displaystyle A^{*}A=I_{n}} ) or orthonormal rows ( A A = I m {\displaystyle AA^{*}=I_{m}} ), then: A + = A . {\displaystyle A^{+}=A^{*}.}

Normal matrices

If ⁠ A {\displaystyle A} ⁠ is normal, that is, it commutes with its conjugate transpose, then its pseudoinverse can be computed by diagonalizing it, mapping all nonzero eigenvalues to their inverses, and mapping zero eigenvalues to zero. A corollary is that ⁠ A {\displaystyle A} ⁠ commuting with its transpose implies that it commutes with its pseudoinverse.

EP matrices

A (square) matrix ⁠ A {\displaystyle A} ⁠ is said to be an EP matrix if it commutes with its pseudoinverse. In such cases (and only in such cases), it is possible to obtain the pseudoinverse as a polynomial in ⁠ A {\displaystyle A} ⁠. A polynomial p ( t ) {\displaystyle p(t)} such that A + = p ( A ) {\displaystyle A^{+}=p(A)} can be easily obtained from the characteristic polynomial of ⁠ A {\displaystyle A} ⁠ or, more generally, from any annihilating polynomial of ⁠ A {\displaystyle A} ⁠.

Orthogonal projection matrices

This is a special case of a normal matrix with eigenvalues 0 and 1. If ⁠ A {\displaystyle A} ⁠ is an orthogonal projection matrix, that is, A = A {\displaystyle A=A^{*}} and A 2 = A {\displaystyle A^{2}=A} , then the pseudoinverse trivially coincides with the matrix itself: A + = A . {\displaystyle A^{+}=A.}

Circulant matrices

For a circulant matrix C {\displaystyle C} ⁠, the singular value decomposition is given by the Fourier transform, that is, the singular values are the Fourier coefficients. Let ⁠ F {\displaystyle {\mathcal {F}}} ⁠ be the Discrete Fourier Transform (DFT) matrix; then C = F Σ F , C + = F Σ + F . {\displaystyle {\begin{aligned}C&={\mathcal {F}}\cdot \Sigma \cdot {\mathcal {F}}^{*},\\C^{+}&={\mathcal {F}}\cdot \Sigma ^{+}\cdot {\mathcal {F}}^{*}.\end{aligned}}}

Construction

Rank decomposition

Let ⁠ r min ( m , n ) {\displaystyle r\leq \min(m,n)} ⁠ denote the rank of ⁠ A K m × n {\displaystyle A\in \mathbb {K} ^{m\times n}} ⁠. Then ⁠ A {\displaystyle A} ⁠ can be (rank) decomposed as A = B C {\displaystyle A=BC} where ⁠ B K m × r {\displaystyle B\in \mathbb {K} ^{m\times r}} ⁠ and ⁠ C K r × n {\displaystyle C\in \mathbb {K} ^{r\times n}} ⁠ are of rank ⁠ r {\displaystyle r} ⁠. Then A + = C + B + = C ( C C ) 1 ( B B ) 1 B {\displaystyle A^{+}=C^{+}B^{+}=C^{*}\left(CC^{*}\right)^{-1}\left(B^{*}B\right)^{-1}B^{*}} .

The QR method

For K { R , C } {\displaystyle \mathbb {K} \in \{\mathbb {R} ,\mathbb {C} \}} computing the product ⁠ A A {\displaystyle AA^{*}} ⁠ or ⁠ A A {\displaystyle A^{*}A} ⁠ and their inverses explicitly is often a source of numerical rounding errors and computational cost in practice. An alternative approach using the QR decomposition of ⁠ A {\displaystyle A} ⁠ may be used instead.

Consider the case when ⁠ A {\displaystyle A} ⁠ is of full column rank, so that A + = ( A A ) 1 A {\displaystyle A^{+}=\left(A^{*}A\right)^{-1}A^{*}} . Then the Cholesky decomposition A A = R R {\displaystyle A^{*}A=R^{*}R} , where ⁠ R {\displaystyle R} ⁠ is an upper triangular matrix, may be used. Multiplication by the inverse is then done easily by solving a system with multiple right-hand sides, A + = ( A A ) 1 A ( A A ) A + = A R R A + = A {\displaystyle A^{+}=\left(A^{*}A\right)^{-1}A^{*}\quad \Leftrightarrow \quad \left(A^{*}A\right)A^{+}=A^{*}\quad \Leftrightarrow \quad R^{*}RA^{+}=A^{*}}

which may be solved by forward substitution followed by back substitution.

The Cholesky decomposition may be computed without forming ⁠ A A {\displaystyle A^{*}A} ⁠ explicitly, by alternatively using the QR decomposition of A = Q R {\displaystyle A=QR} , where Q {\displaystyle Q} has orthonormal columns, Q Q = I {\displaystyle Q^{*}Q=I} , and ⁠ R {\displaystyle R} ⁠ is upper triangular. Then A A = ( Q R ) ( Q R ) = R Q Q R = R R , {\displaystyle A^{*}A\,=\,(QR)^{*}(QR)\,=\,R^{*}Q^{*}QR\,=\,R^{*}R,}

so ⁠ R {\displaystyle R} ⁠ is the Cholesky factor of ⁠ A A {\displaystyle A^{*}A} ⁠.

The case of full row rank is treated similarly by using the formula A + = A ( A A ) 1 {\displaystyle A^{+}=A^{*}\left(AA^{*}\right)^{-1}} and using a similar argument, swapping the roles of ⁠ A {\displaystyle A} ⁠ and ⁠ A {\displaystyle A^{*}} ⁠.

Using polynomials in matrices

For an arbitrary ⁠ A K m × n {\displaystyle A\in \mathbb {K} ^{m\times n}} ⁠, one has that A A {\displaystyle A^{*}A} is normal and, as a consequence, an EP matrix. One can then find a polynomial p ( t ) {\displaystyle p(t)} such that ( A A ) + = p ( A A ) {\displaystyle (A^{*}A)^{+}=p(A^{*}A)} . In this case one has that the pseudoinverse of ⁠ A {\displaystyle A} ⁠ is given by A + = p ( A A ) A = A p ( A A ) . {\displaystyle A^{+}=p(A^{*}A)A^{*}=A^{*}p(AA^{*}).}

Singular value decomposition (SVD)

A computationally simple and accurate way to compute the pseudoinverse is by using the singular value decomposition. If A = U Σ V {\displaystyle A=U\Sigma V^{*}} is the singular value decomposition of ⁠ A {\displaystyle A} ⁠, then A + = V Σ + U {\displaystyle A^{+}=V\Sigma ^{+}U^{*}} . For a rectangular diagonal matrix such as ⁠ Σ {\displaystyle \Sigma } ⁠, we get the pseudoinverse by taking the reciprocal of each non-zero element on the diagonal, leaving the zeros in place. In numerical computation, only elements larger than some small tolerance are taken to be nonzero, and the others are replaced by zeros. For example, in the MATLAB or GNU Octave function pinv, the tolerance is taken to be t = ε⋅max(m, n)⋅max(Σ), where ε is the machine epsilon.

The computational cost of this method is dominated by the cost of computing the SVD, which is several times higher than matrix–matrix multiplication, even if a state-of-the art implementation (such as that of LAPACK) is used.

The above procedure shows why taking the pseudoinverse is not a continuous operation: if the original matrix ⁠ A {\displaystyle A} ⁠ has a singular value 0 (a diagonal entry of the matrix ⁠ Σ {\displaystyle \Sigma } ⁠ above), then modifying ⁠ A {\displaystyle A} ⁠ slightly may turn this zero into a tiny positive number, thereby affecting the pseudoinverse dramatically as we now have to take the reciprocal of a tiny number.

Block matrices

Optimized approaches exist for calculating the pseudoinverse of block-structured matrices.

The iterative method of Ben-Israel and Cohen

Another method for computing the pseudoinverse (cf. Drazin inverse) uses the recursion A i + 1 = 2 A i A i A A i , {\displaystyle A_{i+1}=2A_{i}-A_{i}AA_{i},}

which is sometimes referred to as hyper-power sequence. This recursion produces a sequence converging quadratically to the pseudoinverse of ⁠ A {\displaystyle A} ⁠ if it is started with an appropriate ⁠ A 0 {\displaystyle A_{0}} ⁠ satisfying A 0 A = ( A 0 A ) {\displaystyle A_{0}A=\left(A_{0}A\right)^{*}} . The choice A 0 = α A {\displaystyle A_{0}=\alpha A^{*}} (where 0 < α < 2 / σ 1 2 ( A ) {\displaystyle 0<\alpha <2/\sigma _{1}^{2}(A)} , with ⁠ σ 1 ( A ) {\displaystyle \sigma _{1}(A)} ⁠ denoting the largest singular value of ⁠ A {\displaystyle A} ⁠) has been argued not to be competitive to the method using the SVD mentioned above, because even for moderately ill-conditioned matrices it takes a long time before ⁠ A i {\displaystyle A_{i}} ⁠ enters the region of quadratic convergence. However, if started with ⁠ A 0 {\displaystyle A_{0}} ⁠ already close to the Moore–Penrose inverse and A 0 A = ( A 0 A ) {\displaystyle A_{0}A=\left(A_{0}A\right)^{*}} , for example A 0 := ( A A + δ I ) 1 A {\displaystyle A_{0}:=\left(A^{*}A+\delta I\right)^{-1}A^{*}} , convergence is fast (quadratic).

Updating the pseudoinverse

For the cases where ⁠ A {\displaystyle A} ⁠ has full row or column rank, and the inverse of the correlation matrix (⁠ A A {\displaystyle AA^{*}} ⁠ for ⁠ A {\displaystyle A} ⁠ with full row rank or ⁠ A A {\displaystyle A^{*}A} ⁠ for full column rank) is already known, the pseudoinverse for matrices related to ⁠ A {\displaystyle A} ⁠ can be computed by applying the Sherman–Morrison–Woodbury formula to update the inverse of the correlation matrix, which may need less work. In particular, if the related matrix differs from the original one by only a changed, added or deleted row or column, incremental algorithms exist that exploit the relationship.

Similarly, it is possible to update the Cholesky factor when a row or column is added, without creating the inverse of the correlation matrix explicitly. However, updating the pseudoinverse in the general rank-deficient case is much more complicated.

Software libraries

High-quality implementations of SVD, QR, and back substitution are available in standard libraries, such as LAPACK. Writing one's own implementation of SVD is a major programming project that requires a significant numerical expertise. In special circumstances, such as parallel computing or embedded computing, however, alternative implementations by QR or even the use of an explicit inverse might be preferable, and custom implementations may be unavoidable.

The Python package NumPy provides a pseudoinverse calculation through its functions matrix.I and linalg.pinv; its pinv uses the SVD-based algorithm. SciPy adds a function scipy.linalg.pinv that uses a least-squares solver.

The MASS package for R provides a calculation of the Moore–Penrose inverse through the ginv function. The ginv function calculates a pseudoinverse using the singular value decomposition provided by the svd function in the base R package. An alternative is to employ the pinv function available in the pracma package.

The Octave programming language provides a pseudoinverse through the standard package function pinv and the pseudo_inverse() method.

In Julia (programming language), the LinearAlgebra package of the standard library provides an implementation of the Moore–Penrose inverse pinv() implemented via singular-value decomposition.

Applications

Linear least-squares

See also: Linear least squares (mathematics)

The pseudoinverse provides a least squares solution to a system of linear equations. For ⁠ A K m × n {\displaystyle A\in \mathbb {K} ^{m\times n}} ⁠, given a system of linear equations A x = b , {\displaystyle Ax=b,}

in general, a vector ⁠ x {\displaystyle x} ⁠ that solves the system may not exist, or if one does exist, it may not be unique. More specifically, a solution exists if and only if b {\displaystyle b} is in the image of A {\displaystyle A} , and is unique if and only if A {\displaystyle A} is injective. The pseudoinverse solves the "least-squares" problem as follows:

  • x K n {\displaystyle \forall x\in \mathbb {K} ^{n}} ⁠, we have A x b 2 A z b 2 {\displaystyle \left\|Ax-b\right\|_{2}\geq \left\|Az-b\right\|_{2}} where z = A + b {\displaystyle z=A^{+}b} and 2 {\displaystyle \|\cdot \|_{2}} denotes the Euclidean norm. This weak inequality holds with equality if and only if x = A + b + ( I A + A ) w {\displaystyle x=A^{+}b+\left(I-A^{+}A\right)w} for any vector ⁠ w {\displaystyle w} ⁠; this provides an infinitude of minimizing solutions unless ⁠ A {\displaystyle A} ⁠ has full column rank, in which case ⁠ ( I A + A ) {\displaystyle \left(I-A^{+}A\right)} ⁠ is a zero matrix. The solution with minimum Euclidean norm is ⁠ z . {\displaystyle z.}

This result is easily extended to systems with multiple right-hand sides, when the Euclidean norm is replaced by the Frobenius norm. Let ⁠ B K m × p {\displaystyle B\in \mathbb {K} ^{m\times p}} ⁠.

  • X K n × p {\displaystyle \forall X\in \mathbb {K} ^{n\times p}} ⁠, we have A X B F A Z B F {\displaystyle \|AX-B\|_{\mathrm {F} }\geq \|AZ-B\|_{\mathrm {F} }} where Z = A + B {\displaystyle Z=A^{+}B} and F {\displaystyle \|\cdot \|_{\mathrm {F} }} denotes the Frobenius norm.

Obtaining all solutions of a linear system

If the linear system

A x = b {\displaystyle Ax=b}

has any solutions, they are all given by

x = A + b + [ I A + A ] w {\displaystyle x=A^{+}b+\leftw}

for arbitrary vector ⁠ w {\displaystyle w} ⁠. Solution(s) exist if and only if A A + b = b {\displaystyle AA^{+}b=b} . If the latter holds, then the solution is unique if and only if ⁠ A {\displaystyle A} ⁠ has full column rank, in which case ⁠ I A + A {\displaystyle I-A^{+}A} ⁠ is a zero matrix. If solutions exist but ⁠ A {\displaystyle A} ⁠ does not have full column rank, then we have an indeterminate system, all of whose infinitude of solutions are given by this last equation.

Minimum norm solution to a linear system

For linear systems A x = b , {\displaystyle Ax=b,} with non-unique solutions (such as under-determined systems), the pseudoinverse may be used to construct the solution of minimum Euclidean norm x 2 {\displaystyle \|x\|_{2}} among all solutions.

  • If A x = b {\displaystyle Ax=b} is satisfiable, the vector z = A + b {\displaystyle z=A^{+}b} is a solution, and satisfies z 2 x 2 {\displaystyle \|z\|_{2}\leq \|x\|_{2}} for all solutions.

This result is easily extended to systems with multiple right-hand sides, when the Euclidean norm is replaced by the Frobenius norm. Let ⁠ B K m × p {\displaystyle B\in \mathbb {K} ^{m\times p}} ⁠.

  • If A X = B {\displaystyle AX=B} is satisfiable, the matrix Z = A + B {\displaystyle Z=A^{+}B} is a solution, and satisfies Z F X F {\displaystyle \|Z\|_{\mathrm {F} }\leq \|X\|_{\mathrm {F} }} for all solutions.

Condition number

Using the pseudoinverse and a matrix norm, one can define a condition number for any matrix: cond ( A ) = A A + . {\displaystyle {\mbox{cond}}(A)=\|A\|\left\|A^{+}\right\|.}

A large condition number implies that the problem of finding least-squares solutions to the corresponding system of linear equations is ill-conditioned in the sense that small errors in the entries of ⁠ A {\displaystyle A} ⁠ can lead to huge errors in the entries of the solution.

Generalizations

The weighted pseudoinverse generalizes the Moore-Penrose inverse between metric spaces with weight matrices in the domain and range. These weights are the identity for the standard Moore-Penrose inverse, which assumes an orthonormal basis in both spaces.

In order to solve more general least-squares problems, one can define Moore–Penrose inverses for all continuous linear operators ⁠ A : H 1 H 2 {\displaystyle A:H_{1}\rightarrow H_{2}} ⁠ between two Hilbert spaces H 1 {\displaystyle H_{1}} ⁠ and ⁠ H 2 {\displaystyle H_{2}} ⁠, using the same four conditions as in our definition above. It turns out that not every continuous linear operator has a continuous linear pseudoinverse in this sense. Those that do are precisely the ones whose range is closed in ⁠ H 2 {\displaystyle H_{2}} ⁠.

A notion of pseudoinverse exists for matrices over an arbitrary field equipped with an arbitrary involutive automorphism. In this more general setting, a given matrix doesn't always have a pseudoinverse. The necessary and sufficient condition for a pseudoinverse to exist is that rank ( A ) = rank ( A A ) = rank ( A A ) {\displaystyle \operatorname {rank} (A)=\operatorname {rank} \left(A^{*}A\right)=\operatorname {rank} \left(AA^{*}\right)} , where A {\displaystyle A^{*}} denotes the result of applying the involution operation to the transpose of A {\displaystyle A} . When it does exist, it is unique. Example: Consider the field of complex numbers equipped with the identity involution (as opposed to the involution considered elsewhere in the article); do there exist matrices that fail to have pseudoinverses in this sense? Consider the matrix A = [ 1 i ] T {\displaystyle A={\begin{bmatrix}1&i\end{bmatrix}}^{\operatorname {T} }} . Observe that rank ( A A T ) = 1 {\displaystyle \operatorname {rank} \left(AA^{\operatorname {T} }\right)=1} while rank ( A T A ) = 0 {\displaystyle \operatorname {rank} \left(A^{\operatorname {T} }A\right)=0} . So this matrix doesn't have a pseudoinverse in this sense.

In abstract algebra, a Moore–Penrose inverse may be defined on a *-regular semigroup. This abstract definition coincides with the one in linear algebra.

See also

Notes

  1. Moore, E. H. (1920). "On the reciprocal of the general algebraic matrix". Bulletin of the American Mathematical Society. 26 (9): 394–95. doi:10.1090/S0002-9904-1920-03322-7.
  2. Bjerhammar, Arne (1951). "Application of calculus of matrices to method of least squares; with special references to geodetic calculations". Trans. Roy. Inst. Tech. Stockholm. 49.
  3. ^ Penrose, Roger (1955). "A generalized inverse for matrices". Proceedings of the Cambridge Philosophical Society. 51 (3): 406–13. Bibcode:1955PCPS...51..406P. doi:10.1017/S0305004100030401.
  4. ^ Golub, Gene H.; Charles F. Van Loan (1996). Matrix computations (3rd ed.). Baltimore: Johns Hopkins. pp. 257–258. ISBN 978-0-8018-5414-9.
  5. Campbell & Meyer 1991.
  6. ^ Stoer, Josef; Bulirsch, Roland (2002). Introduction to Numerical Analysis (3rd ed.). Berlin, New York: Springer-Verlag. ISBN 978-0-387-95452-3..
  7. Greville, T. N. E. (1966-10-01). "Note on the Generalized Inverse of a Matrix Product". SIAM Review. 8 (4): 518–521. Bibcode:1966SIAMR...8..518G. doi:10.1137/1008107. ISSN 0036-1445.
  8. Maciejewski, Anthony A.; Klein, Charles A. (1985). "Obstacle Avoidance for Kinematically Redundant Manipulators in Dynamically Varying Environments". International Journal of Robotics Research. 4 (3): 109–117. doi:10.1177/027836498500400308. hdl:10217/536. S2CID 17660144.
  9. Rakočević, Vladimir (1997). "On continuity of the Moore–Penrose and Drazin inverses" (PDF). Matematički Vesnik. 49: 163–72.
  10. Golub, G. H.; Pereyra, V. (April 1973). "The Differentiation of Pseudo-Inverses and Nonlinear Least Squares Problems Whose Variables Separate". SIAM Journal on Numerical Analysis. 10 (2): 413–32. Bibcode:1973SJNA...10..413G. doi:10.1137/0710036. JSTOR 2156365.
  11. Hjørungnes, Are (2011). Complex-valued matrix derivatives: with applications in signal processing and communications. New York: Cambridge university press. p. 52. ISBN 9780521192644.
  12. Liu, Shuangzhe; Trenkler, Götz; Kollo, Tõnu; von Rosen, Dietrich; Baksalary, Oskar Maria (2023). "Professor Heinz Neudecker and matrix differential calculus". Statistical Papers. 65 (4): 2605–2639. doi:10.1007/s00362-023-01499-w.
  13. ^ Ben-Israel & Greville 2003.
  14. ^ Bajo, I. (2021). "Computing Moore–Penrose Inverses with Polynomials in Matrices". American Mathematical Monthly. 128 (5): 446–456. doi:10.1080/00029890.2021.1886840. hdl:11093/6146.
  15. Stallings, W. T.; Boullion, T. L. (1972). "The Pseudoinverse of an r-Circulant Matrix". Proceedings of the American Mathematical Society. 34 (2): 385–88. doi:10.2307/2038377. JSTOR 2038377.
  16. Linear Systems & Pseudo-Inverse
  17. Ben-Israel, Adi; Cohen, Dan (1966). "On Iterative Computation of Generalized Inverses and Associated Projections". SIAM Journal on Numerical Analysis. 3 (3): 410–19. Bibcode:1966SJNA....3..410B. doi:10.1137/0703035. JSTOR 2949637.pdf
  18. Söderström, Torsten; Stewart, G. W. (1974). "On the Numerical Properties of an Iterative Method for Computing the Moore–Penrose Generalized Inverse". SIAM Journal on Numerical Analysis. 11 (1): 61–74. Bibcode:1974SJNA...11...61S. doi:10.1137/0711008. JSTOR 2156431.
  19. Gramß, Tino (1992). Worterkennung mit einem künstlichen neuronalen Netzwerk (PhD dissertation). Georg-August-Universität zu Göttingen. OCLC 841706164.
  20. Emtiyaz, Mohammad (February 27, 2008). "Updating Inverse of a Matrix When a Column is Added/Removed" (PDF).
  21. Meyer, Carl D. Jr. (1973). "Generalized inverses and ranks of block matrices". SIAM J. Appl. Math. 25 (4): 597–602. doi:10.1137/0125057.
  22. Meyer, Carl D. Jr. (1973). "Generalized inversion of modified matrices". SIAM J. Appl. Math. 24 (3): 315–23. doi:10.1137/0124033.
  23. "R: Generalized Inverse of a Matrix".
  24. "LinearAlgebra.pinv".
  25. Penrose, Roger (1956). "On best approximate solution of linear matrix equations". Proceedings of the Cambridge Philosophical Society. 52 (1): 17–19. Bibcode:1956PCPS...52...17P. doi:10.1017/S0305004100030929. S2CID 122260851.
  26. ^ Planitz, M. (October 1979). "Inconsistent systems of linear equations". Mathematical Gazette. 63 (425): 181–85. doi:10.2307/3617890. JSTOR 3617890. S2CID 125601192.
  27. ^ James, M. (June 1978). "The generalised inverse". Mathematical Gazette. 62 (420): 109–14. doi:10.1017/S0025557200086460. S2CID 126385532.
  28. ^ Hagen, Roland; Roch, Steffen; Silbermann, Bernd (2001). "Section 2.1.2". C*-algebras and Numerical Analysis. CRC Press.
  29. Price, Charles M. (1963-03-15). "The Matrix Pseudoinverse and Minimal Variance Estimates". SIAM Review. 6 (2): 115–120. doi:10.1137/1006029. ISSN 1095-7200.
  30. Pearl, Martin H. (1968-10-01). "Generalized inverses of matrices with entries taken from an arbitrary field". Linear Algebra and Its Applications. 1 (4): 571–587. doi:10.1016/0024-3795(68)90028-1. ISSN 0024-3795.

References

External links

Numerical linear algebra
Key concepts
Problems
Hardware
Software
Roger Penrose
Books
Coauthored books
Concepts
Related
Categories: