Misplaced Pages

Marchenko–Pastur distribution

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
(Redirected from Marchenko-Pastur distribution) Distribution of singular values of large rectangular random matrices
Plot of the Marchenko-Pastur distribution for various values of lambda

In the mathematical theory of random matrices, the Marchenko–Pastur distribution, or Marchenko–Pastur law, describes the asymptotic behavior of singular values of large rectangular random matrices. The theorem is named after soviet mathematicians Volodymyr Marchenko and Leonid Pastur who proved this result in 1967.

If X {\displaystyle X} denotes a m × n {\displaystyle m\times n} random matrix whose entries are independent identically distributed random variables with mean 0 and variance σ 2 < {\displaystyle \sigma ^{2}<\infty } , let

Y n = 1 n X X T {\displaystyle Y_{n}={\frac {1}{n}}XX^{T}}

and let λ 1 , λ 2 , , λ m {\displaystyle \lambda _{1},\,\lambda _{2},\,\dots ,\,\lambda _{m}} be the eigenvalues of Y n {\displaystyle Y_{n}} (viewed as random variables). Finally, consider the random measure

μ m ( A ) = 1 m # { λ j A } , A R . {\displaystyle \mu _{m}(A)={\frac {1}{m}}\#\left\{\lambda _{j}\in A\right\},\quad A\subset \mathbb {R} .}

counting the number of eigenvalues in the subset A {\displaystyle A} included in R {\displaystyle \mathbb {R} } .

Theorem. Assume that m , n {\displaystyle m,\,n\,\to \,\infty } so that the ratio m / n λ ( 0 , + ) {\displaystyle m/n\,\to \,\lambda \in (0,+\infty )} . Then μ m μ {\displaystyle \mu _{m}\,\to \,\mu } (in weak* topology in distribution), where

μ ( A ) = { ( 1 1 λ ) 1 0 A + ν ( A ) , if  λ > 1 ν ( A ) , if  0 λ 1 , {\displaystyle \mu (A)={\begin{cases}(1-{\frac {1}{\lambda }})\mathbf {1} _{0\in A}+\nu (A),&{\text{if }}\lambda >1\\\nu (A),&{\text{if }}0\leq \lambda \leq 1,\end{cases}}}

and

d ν ( x ) = 1 2 π σ 2 ( λ + x ) ( x λ ) λ x 1 x [ λ , λ + ] d x {\displaystyle d\nu (x)={\frac {1}{2\pi \sigma ^{2}}}{\frac {\sqrt {(\lambda _{+}-x)(x-\lambda _{-})}}{\lambda x}}\,\mathbf {1} _{x\in }\,dx}

with

λ ± = σ 2 ( 1 ± λ ) 2 . {\displaystyle \lambda _{\pm }=\sigma ^{2}(1\pm {\sqrt {\lambda }})^{2}.}

The Marchenko–Pastur law also arises as the free Poisson law in free probability theory, having rate 1 / λ {\displaystyle 1/\lambda } and jump size σ 2 {\displaystyle \sigma ^{2}} .

Moments

For each k 1 {\displaystyle k\geq 1} , its k {\displaystyle k} -th moment is

r = 0 k 1 σ 2 k r + 1 ( k r ) ( k 1 r ) λ r = σ 2 k k r = 0 k 1 ( k r ) ( k r + 1 ) λ r {\displaystyle \sum _{r=0}^{k-1}{\frac {\sigma ^{2k}}{r+1}}{\binom {k}{r}}{\binom {k-1}{r}}\lambda ^{r}={\frac {\sigma ^{2k}}{k}}\sum _{r=0}^{k-1}{\binom {k}{r}}{\binom {k}{r+1}}\lambda ^{r}}

Some transforms of this law

The Stieltjes transform is given by

s ( z ) = σ 2 ( 1 λ ) z ( z σ 2 ( λ + 1 ) ) 2 4 λ σ 4 2 λ z σ 2 {\displaystyle s(z)={\frac {\sigma ^{2}(1-\lambda )-z-{\sqrt {(z-\sigma ^{2}(\lambda +1))^{2}-4\lambda \sigma ^{4}}}}{2\lambda z\sigma ^{2}}}}

for complex numbers z of positive imaginary part, where the complex square root is also taken to have positive imaginary part. The Stieltjes transform can be repackaged in the form of the R-transform, which is given by

R ( z ) = σ 2 1 σ 2 λ z {\displaystyle R(z)={\frac {\sigma ^{2}}{1-\sigma ^{2}\lambda z}}}

The S-transform is given by

S ( z ) = 1 σ 2 ( 1 + λ z ) . {\displaystyle S(z)={\frac {1}{\sigma ^{2}(1+\lambda z)}}.}

For the case of σ = 1 {\displaystyle \sigma =1} , the η {\displaystyle \eta } -transform is given by E 1 1 + γ X {\displaystyle \mathbb {E} {\frac {1}{1+\gamma X}}} where X {\displaystyle X} satisfies the Marchenko-Pastur law.

η ( γ ) = 1 F ( γ , λ ) 4 γ λ {\displaystyle \eta (\gamma )=1-{\frac {{\mathcal {F}}(\gamma ,\lambda )}{4\gamma \lambda }}}

where F ( x , z ) = ( x ( 1 + z ) 2 + 1 x ( 1 z ) 2 + 1 ) 2 {\displaystyle {\mathcal {F}}(x,z)=\left({\sqrt {x(1+{\sqrt {z}})^{2}+1}}-{\sqrt {x(1-{\sqrt {z}})^{2}+1}}\right)^{2}}

For exact analyis of high dimensional regression in the proportional asymptotic regime, a convenient form is often T ( u ) := η ( 1 u ) {\displaystyle T(u):=\eta \left({\tfrac {1}{u}}\right)} which simplifies to

T ( u ) = 1 + λ u + ( 1 + u λ ) 2 + 4 u λ 2 λ {\displaystyle T(u)={\frac {-1+\lambda -u+{\sqrt {(1+u-\lambda )^{2}+4u\lambda }}}{2\lambda }}}

The following functions B ( u ) := E ( u X + u ) 2 {\displaystyle B(u):=\mathbb {E} \left({\frac {u}{X+u}}\right)^{2}} and V ( u ) := X ( X + u ) 2 {\displaystyle V(u):={\frac {X}{(X+u)^{2}}}} , where X {\displaystyle X} satisfies the Marchenko-Pastur law, show up in the limiting Bias and Variance respectively, of ridge regression and other regularized linear regression problems. One can show that B ( u ) = T ( u ) u T ( u ) {\displaystyle B(u)=T(u)-u\cdot T'(u)} and V ( u ) = T ( u ) {\displaystyle V(u)=T'(u)} .

Application to correlation matrices

For the special case of correlation matrices, we know that σ 2 = 1 {\displaystyle \sigma ^{2}=1} and λ = m / n {\displaystyle \lambda =m/n} . This bounds the probability mass over the interval defined by

λ ± = ( 1 ± m n ) 2 . {\displaystyle \lambda _{\pm }=\left(1\pm {\sqrt {\frac {m}{n}}}\right)^{2}.}

Since this distribution describes the spectrum of random matrices with mean 0, the eigenvalues of correlation matrices that fall inside of the aforementioned interval could be considered spurious or noise. For instance, obtaining a correlation matrix of 10 stock returns calculated over a 252 trading days period would render λ + = ( 1 + 10 252 ) 2 1.43 {\displaystyle \lambda _{+}=\left(1+{\sqrt {\frac {10}{252}}}\right)^{2}\approx 1.43} . Thus, out of 10 eigenvalues of said correlation matrix, only the values higher than 1.43 would be considered significantly different from random.

See also

References

  1. Bai & Silverstein 2010, Section 3.1.1.
  2. Bai & Silverstein 2010, Section 3.3.1.
  3. ^ Tulino & Verdú 2004, Section 2.2.
Probability distributions (list)
Discrete
univariate
with finite
support
with infinite
support
Continuous
univariate
supported on a
bounded interval
supported on a
semi-infinite
interval
supported
on the whole
real line
with support
whose type varies
Mixed
univariate
continuous-
discrete
Multivariate
(joint)
Directional
Univariate (circular) directional
Circular uniform
Univariate von Mises
Wrapped normal
Wrapped Cauchy
Wrapped exponential
Wrapped asymmetric Laplace
Wrapped Lévy
Bivariate (spherical)
Kent
Bivariate (toroidal)
Bivariate von Mises
Multivariate
von Mises–Fisher
Bingham
Degenerate
and singular
Degenerate
Dirac delta function
Singular
Cantor
Families
Categories: