Exact test - Misplaced Pages

This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)

This article needs attention from an expert in statistics. The specific problem is: Needs discussion in body of exact tests in general. WikiProject Statistics may be able to help recruit an expert. (November 2008)

This article includes a list of references, related reading, or external links, but its sources remain unclear because it lacks inline citations. Please help improve this article by introducing more precise citations. (October 2021) (Learn how and when to remove this message)

(Learn how and when to remove this message)

An exact (significance) test is a statistical test such that if the null hypothesis is true, then all assumptions made during the derivation of the distribution of the test statistic are met. Using an exact test provides a significance test that maintains the type I error rate of the test ( $\alpha$ ) at the desired significance level of the test. For example, an exact test at a significance level of $\alpha =5\%$ , when repeated over many samples where the null hypothesis is true, will reject at most $5\%$ of the time. This is in contrast to an approximate test in which the desired type I error rate is only approximately maintained (i.e.: the test might reject > 5% of the time), while this approximation may be made as close to $\alpha$ as desired by making the sample size sufficiently large.

Exact tests that are based on discrete test statistics may be conservative, indicating that the actual rejection rate lies below the nominal significance level $\alpha$ . As an example, this is the case for Fisher's exact test and its more powerful alternative, Boschloo's test. If the test statistic is continuous, it will reach the significance level exactly.

Parametric tests, such as those used in exact statistics, are exact tests when the parametric assumptions are fully met, but in practice, the use of the term exact (significance) test is reserved for non-parametric tests, i.e., tests that do not rest on parametric assumptions. However, in practice, most implementations of non-parametric test software use asymptotical algorithms to obtain the significance value, which renders the test non-exact.

Hence, when a result of statistical analysis is termed an “exact test” or specifies an “exact p-value”, this implies that the test is defined without parametric assumptions and is evaluated without making use of approximate algorithms. In principle, however, this could also signify that a parametric test has been employed in a situation where all parametric assumptions are fully met, but it is in most cases impossible to prove this completely in a real-world situation. Exceptions in which it is certain that parametric tests are exact include tests based on the binomial or Poisson distributions. The term permutation test is sometimes used as a synonym for exact test, but it should be kept in mind that all permutation tests are exact tests, but not all exact tests are permutation tests.

Formulation

The basic equation underlying exact tests is

\Pr({\text{exact}})=\sum _{\mathbf {y} \,:\,T(\mathbf {y} )\geq T(\mathbf {x)} }\Pr(\mathbf {y} )

where:

x is the actual observed outcome,
Pr(y) is the probability under the null hypothesis of a potentially observed outcome y,
T(y) is the value of the test statistic for an outcome y, with larger values of T representing cases which notionally represent greater departures from the null hypothesis,

and where the sum ranges over all outcomes y (including the observed one) that have the same value of the test statistic obtained for the observed sample x, or a larger one.

Example: Pearson's chi-squared test versus an exact test

Main article: Pearson's chi-squared test

A simple example of this concept involves the observation that Pearson's chi-squared test is an approximate test. Suppose Pearson's chi-squared test is used to ascertain whether a six-sided die is "fair", indicating that it renders each of the six possible outcomes equally often. If the die is thrown n times, then one "expects" to see each outcome n/6 times. The test statistic is

\sum {\frac {({\text{observed}}-{\text{expected}})^{2}}{\text{expected}}}=\sum _{k=1}^{6}{\frac {(X_{k}-n/6)^{2}}{n/6}},

where X_k is the number of times outcome k is observed. If the null hypothesis of "fairness" is true, then the probability distribution of the test statistic can be made as close as desired to the chi-squared distribution with 5 degrees of freedom by making the sample size n sufficiently large. On the other hand, if n is small, then the probabilities based on chi-squared distributions may not be sufficiently close approximations. Finding the exact probability that this test statistic exceeds a certain value would then require a combinatorial enumeration of all outcomes of the experiment that gives rise to such a large value of the test statistic. It is then questionable whether the same test statistic ought to be used. A likelihood-ratio test might be preferred, and the test statistic might not be a monotone function of the one above.

Example: Fisher's exact test

Main article: Fisher's exact test

Fisher's exact test, based on the work of Ronald Fisher and E. J. G. Pitman in the 1930s, is exact because the sampling distribution (conditional on the marginals) is known exactly. This should be compared with Pearson's chi-squared test, which (although it tests the same null) is not exact because the distribution of the test statistic is only asymptotically correct.

References

Ronald Fisher (1954) Statistical Methods for Research Workers. Oliver and Boyd.
Mehta, C.R.; Patel, N.R. (1998). "Exact Inference for Categorical Data". In P. Armitage and T. Colton, eds., Encyclopedia of Biostatistics, Chichester: John Wiley, pp. 1411–1422. unpublished preprint
Corcoran, C. D.; Senchaudhuri, P.; Mehta, C. R.; Patel, N. R. (2005). "Exact Inference for Categorical Data". Encyclopedia of Biostatistics. doi:10.1002/0470011815.b2a10019. ISBN 047084907X.

Statistics

Descriptive statistics

Continuous data

Center	Mean Arithmetic Arithmetic-Geometric Contraharmonic Cubic Generalized/power Geometric Harmonic Heronian Heinz Lehmer Median Mode
Dispersion	Average absolute deviation Coefficient of variation Interquartile range Percentile Range Standard deviation Variance
Shape	Central limit theorem Moments Kurtosis L-moments Skewness

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Effect size Missing data Optimal design Population Replication Sample size determination Statistic Statistical power
Survey methodology	Sampling Cluster Stratified Opinion poll Questionnaire Standard error
Controlled experiments	Blocking Factorial experiment Interaction Random assignment Randomized controlled trial Randomized experiment Scientific control
Adaptive designs	Adaptive clinical trial Stochastic approximation Up-and-down designs
Observational studies	Cohort study Cross-sectional study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in
Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife
Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons
Parametric tests	Likelihood-ratio Score/Lagrange multiplier Wald

Specific tests

Z-test (normal) Student's t-test F-test
Goodness of fit	Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC
Rank statistics	Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra) Van der Waerden test

Bayesian inference

Correlation	Pearson product-moment Partial correlation Confounding variable Coefficient of determination
Regression analysis	Errors and residuals Regression validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)
Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression
Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Homoscedasticity and Heteroscedasticity
Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions
Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality
Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey
Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)
Frequency domain	Spectral density estimation Fourier analysis Least-squares spectral analysis Wavelet Whittle likelihood

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time
Hazard function	Nelson–Aalen estimator
Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics
Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification
Social statistics	Actuarial science Census Crime statistics Demography Econometrics Jurimetrics National accounts Official statistics Population statistics Psychometrics
Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category:

Statistical tests

Formulation

Example: Pearson's chi-squared test versus an exact test

Example: Fisher's exact test

See also

References