Misplaced Pages

Least-squares spectral analysis: Difference between revisions

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editNext edit →Content deleted Content addedVisualWikitext
Revision as of 16:54, 28 October 2007 editGeoeg (talk | contribs)490 edits Undid revision 167656093 by Dicklyon (talk) Deleted unscientific mumbo-jumbo.← Previous edit Revision as of 16:56, 28 October 2007 edit undoDicklyon (talk | contribs)Autopatrolled, Extended confirmed users, Rollbackers476,396 edits Terminology: oops, it goes back further; rephraseNext edit →
Line 8: Line 8:


Scargle states that his paper "does not introduce a new detection technique, but instead studies the reliability and efficiency of detection with the most commonly used technique, the periodogram, in the case where the observation times are unevenly spaced," and further points out in reference to least-squares fitting of sinusoids compared to periodogram analysis, that his paper "establishes, apparently for the first time, that (with the proposed modifications) these two methods are exactly equivalent"<ref name=scar/>. Scargle states that his paper "does not introduce a new detection technique, but instead studies the reliability and efficiency of detection with the most commonly used technique, the periodogram, in the case where the observation times are unevenly spaced," and further points out in reference to least-squares fitting of sinusoids compared to periodogram analysis, that his paper "establishes, apparently for the first time, that (with the proposed modifications) these two methods are exactly equivalent"<ref name=scar/>.

==Terminology==

In this article, the term ''sinusoids'' can mean sine and cosine functions, or, in the Lomb–Scargle method, time-shifted (phase-shifted) sines and cosines. Vaníček and his students usually refer to them simply as ''trigonometric functions'' rather than specifically as ''sinusoids''<ref name=craymer/><ref name=sepk/>. This usage was introduced by Vaníček in 1971<ref name=vanb/>, and is explained by Wells ''et al.'' in 1985<ref>D. Wells, P. Vaníček, and S. Pagiatakis, , TR84, Univ. New Brunswick (1985)</ref>, who "define the function trig(x) as being either cos(x) or sin(x)."


==The Vaníček method== ==The Vaníček method==

Revision as of 16:56, 28 October 2007

Least-squares spectral analysis (LSSA) is a method of estimating a frequency spectrum, based on a least squares fit between data and trigonometric functions. Fourier analysis, the most used spectral method in science, generally boosts long-periodic noise in long gapped records; LSSA mitigates such problems and is a superior alternative for analyzing long incomplete records such as most natural datasets.

LSSA is also known as the Vaníček method after Canadian geodesist and geophysicist Petr Vaníček (sometimes Vanícek spectral analysis or Gauss-Vanícek spectral analysis), and as the Lomb method (or the Lomb periodogram) and the Lomb–Scargle method (or Lomb–Scargle periodogram), based on the contributions of Nicholas R. Lomb and, independently, Jeffrey D. Scargle.

Historical background

The concept of least-squares fitting of data with trigonometric functions was first remarked briefly in a 1963 paper by Barning. It was first developed mathematically by Vaníček in 1969 and 1971. The method was then simplified in 1976 by Lomb, who pointed out its close connection to periodogram analysis. It was subsequently further modified and analyzed by Scargle, who pointed out the now-standard way to decorrelate sine and cosine components evaluated at the set of sample times.

Scargle states that his paper "does not introduce a new detection technique, but instead studies the reliability and efficiency of detection with the most commonly used technique, the periodogram, in the case where the observation times are unevenly spaced," and further points out in reference to least-squares fitting of sinusoids compared to periodogram analysis, that his paper "establishes, apparently for the first time, that (with the proposed modifications) these two methods are exactly equivalent".

Terminology

In this article, the term sinusoids can mean sine and cosine functions, or, in the Lomb–Scargle method, time-shifted (phase-shifted) sines and cosines. Vaníček and his students usually refer to them simply as trigonometric functions rather than specifically as sinusoids. This usage was introduced by Vaníček in 1971, and is explained by Wells et al. in 1985, who "define the function trig(x) as being either cos(x) or sin(x)."

The Vaníček method

In the Vaníček method, a discrete data set is approximated by a weighted sum of sinusoids of predetermined frequencies, using a standard linear regression, or least-squares fit. The numbers of sinusoids must be less than or equal to the number of data samples (counting sines and cosines of the same frequency as separate sinusoids).

A data vector Φ is represented as a weighted sum of sinusoidal basis functions, tabulated in a matrix A by evaluating each function at the sample times, with weight vector x:

ϕ A x {\displaystyle \phi \approx {\textbf {A}}x}

where the weight vector x is chosen to minimize the sum of squared errors in approximating Φ. The solution for x is closed-form, using standard linear regression:

x = ( A T A ) 1 A T ϕ {\displaystyle x=({\textbf {A}}^{\mathrm {T} }{\textbf {A}})^{-1}{\textbf {A}}^{\mathrm {T} }\phi } .

Here the matrix A can be based on any set of functions that are mutually independent (not necessarily orthogonal) when evaluated at the sample times; for spectral analysis, the functions used are typically sines and cosines evenly distributed over the frequency range of interest. If too many frequencies are chosen in a too-narrow frequency range, the functions will not be sufficiently independent, the matrix will be badly conditioned, and the resulting spectrum will not be meaningful.

When the basis functions in A are orthogonal (that is, not correlated, meaning the columns have zero pair-wise dot products), the matrix AA is a diagonal matrix; when the columns all have the same power (sum of squares of elements), then that matrix is an identity matrix times a constant, so the inversion is trivial. The latter is the case when the sample times are equally spaced and the sinusoids are chosen to be sines and cosines equally spaced in pairs on the frequency interval 0 to a half cycle per sample (spaced by 1/N cycle per sample, omitting the sine phases at 0 and maximum frequency where they are identically zero). This particular case is known as the discrete Fourier transform, slightly rewritten in terms of real data and coefficients.

x = A T ϕ {\displaystyle x={\textbf {A}}^{\mathrm {T} }\phi }     (DFT case for N equally spaced samples and frequencies, within a scalar factor)

Lomb proposed using this simplification in general, since the correlations between pairs of sinusoids are often small, at least when they are not too closely spaced. This is essentially the traditional periodogram formulation, but now adopted for use with unevenly spaced samples. The vector x is a good estimate of an underlying spectrum, but since correlations are ignored, Ax is no longer a good approximation to the signal, and the method is no longer a least-squares method – yet it has continued to be referred to as such.

The Lomb–Scargle periodogram

Rather than just taking dot products of the data with sine and cosine waveforms directly, Scargle modified Lomb's method to first find a time delay τ such that this pair of sinusoids would be mutually orthogonal at sample times tj, and also adjusted for the potentially unequal powers of these two basis functions, to obtain a better estimate of the power at a frequencyCite error: A <ref> tag is missing the closing </ref> (see the help page)..

Main features

Applications

Vaníček analysis has many scientific applications — ranging from astronomy, geophysics, physics, microbiology, genetics and medicine, to mathematics and finance. This wide applicability stems from many useful properties of the least-squares fit. The most useful feature of the method is enabling for incomplete records to be spectrally analyzed, without the need to manipulate the record or to invent otherwise non-existent data.

Magnitudes in the Vanícek spectrum depict the contribution of a period to the variance of the time series, of the order of several percent. Generally, spectral magnitudes defined in the above manner enable the output's straightforward significance level regime. Alternatively, magnitudes in the Vanícek spectrum can also be expressed in dB. Note that magnitudes in the Vaníček spectrum follow ß-distribution.

Inverse transformation of Vaníček's LSSA is possible, as is most easily seen by writing the forward transform as a matrix; the matrix inverse (when the matrix is not singular) or pseudo-inverse will then be an inverse transformation; the inverse will exactly match the original data if the chosen sinusoids are mutually independent at the sample points and their number is equal to the number of data points. No such inverse procedure is known for the periodogram method.

Implementation

The LSSA can be implemented in less than a page of MATLAB code. For each frequency in a desired set of frequencies, sine and cosine functions are evaluated at the times corresponding to the data samples, and dot products of the data vector with the sinusoid vectors are taken and appropriately normalized; following the method known as Lomb/Scargle periodogram, a time shift is calculated for each frequency to orthogonalize the sine and cosine components before the dot product, as described by Craymer; finally, a power is computed from those two amplitude components. This same process implements a discrete Fourier transform when the data are uniformly spaced in time and the frequencies chosen correspond to integer numbers of cycles over the finite data record.

As Craymer explains, this method treats each sinusoidal component independently, or out of context, even though they may not be orthogonal on the data points, whereas Vaníček's original method does a full simultaneous least-squares fit by solving a matrix equation, partitioning the total data variance between the specified sinusoid frequencies. Such a matrix least-squares solution is natively available in MATLAB as the backslash operator.

Craymer explains that the least-squares method, as opposed to the independent or periodogram version due to Lomb, can not fit more components (sines and cosines) than there are data samples, and further that:

"...serious repercussions can also arise if the selected frequencies result in some of the Fourier components (trig functions) becoming nearly linearly dependent with each other, thereby producing an ill-conditioned or near singular N. To avoid such ill conditioning it becomes necessary to either select a different set of frequencies to be estimated (e.g. equally spaced frequencies) or simply neglect the correlations in N (i.e., the off-diagonal blocks) and estimate the inverse least squares transform separately for the individual frequencies..."

Lomb's periodogram method, on the other hand, can use an arbitrarily high number of, or density of, frequency components, as in a standard periodogram; that is, the frequency domain can be over-sampled by an arbitrary factor.

In Fourier analysis, such as the Fourier transform or the discrete Fourier transform, the sinusoids being fitted to the data are all mutually orthogonal, so there is no distinction between the simple out-of-context dot-product-based projection onto basis functions versus a least-squares fit; that is, no matrix inversion is required to least-squares partition the variance between orthogonal sinusoids of different frequencies. This method is usually preferred for its efficient fast Fourier transform implementation, when complete data records with equally spaced samples are available.

See also

References

  1. ^ Craymer, M.R., The Least Squares Spectrum, Its Inverse Transform and Autocorrelation Function: Theory and Some Applications in Geodesy, Ph.D. Dissertation, University of Toronto, Canada (1998).
  2. ^ Omerbashich M., "Gauss–Vanicek spectral analysis of the Sepkoski compendium: no new life cycles", Pages 26-30, Computing in Science & Engineering, Volume 8, Number 4, (July-August, 2006) ISSN 1521-9615.
  3. ^ Press; et al. (2007). Numerical Recipes (3rd Edition ed.). Cambridge University Press. ISBN 0521880688. {{cite book}}: |edition= has extra text (help); Explicit use of et al. in: |author= (help)
  4. ^ J. Taylor and S. Hamilton (1972-03-20). "Some tests of the Vaníček Method of spectral analysis". Astrophysics and Space Science. Cite error: The named reference "taha" was defined multiple times with different content (see the help page).
  5. Alistair I. Mees (2001). Nonlinear Dynamics and Statistics. Springer. ISBN 0817641637.
  6. Frank Chambers (2002). Climate Change: Critical Concepts in the Environment. Routledge. ISBN 0415278589.
  7. Hans P. A. Van Dongen; et al. (1999). "Searching for Biological Rhythms: Peak Detection in the Periodogram of Unequally Spaced Data". Journal of Biological Rhythms. 14 (6): pp.617–620. {{cite journal}}: |pages= has extra text (help); Explicit use of et al. in: |author= (help)
  8. D. Scott Birney, David Oesper, and Guillermo Gonzalez (2006). Observational Astronomy. Cambridge University Press. ISBN 0521853702.{{cite book}}: CS1 maint: multiple names: authors list (link)
  9. ^ Lomb, N. R., "Least-squares frequency analysis of unequally spaced data," Astrophysics and Space Science 39, p.447–462 (1976).
  10. ^ Scargle, J. D., "Studies in astronomical time series analysis II: Statistical aspects of spectral analysis of unevenly spaced data," Astrophysics and Space Science 302, p.757–763 (1982).
  11. Barning, F.J.M. The numerical analysis of the light-curve of 12 lacertae, Bulletin of the Astronomical Institutes of the Netherlands, 17, p.22-28.
  12. ^ Vanícek P. Approximate Spectral Analysis by Least-squares Fit, Astrophysics and Space Science, Pages 387-391, Volume 4 (1969).
  13. ^ Vanícek P. Further development and properties of the spectral analysis by least-squares fit, Astrophysics and Space Science, Pages 10-33, Volume 12 (1971).
  14. D. Wells, P. Vaníček, and S. Pagiatakis, "Least-Squares Spectral Analysis Revisited", TR84, Univ. New Brunswick (1985)
  15. ^ Pagiatakis, S. Stochastic significance of peaks in the least-squares spectrum, J of Geodesy 73, p.67-78 (1999).
  16. ^ Beard, A.G., Williams, P.J.S., Mitchell, N.J. & Muller, H.G. A special climatology of planetary waves and tidal variability, J Atm. Solar-Ter. Phys. 63 (09), p.801–811 (2001).
  17. ^ Omerbashich M. , Earth-Model Discrimination Method, pp.129, Ph.D. dissertation, University of New Brunswick, Canada (2003).
  18. Steeves, R.R. A statistical test for significance of peaks in the least squares spectrum, Collected Papers of the Geodetic Survey, Department of Energy, Mines and Resources, Surveys and Mapping, Ottawa, Canada, p.149-166 (1981)
  19. Richard A. Muller and Gordon J. MacDonald (2000). Ice Ages and Astronomical Causes: Data, Spectral Analysis, and Mechanisms. Springer. ISBN 3540437797.
  20. Timothy A. Davis and Kermit Sigmon (2005). MATLAB Primer. CRC Press. ISBN 1584885238.
  21. Darrell Williamson (1999). Discrete-Time Signal Processing: An Algebraic Approach. Springer. ISBN 1852331615.

External links

Categories: