Misplaced Pages

Ewens's sampling formula

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
(Redirected from Ewens' sampling formula)
This article includes a list of references, related reading, or external links, but its sources remain unclear because it lacks inline citations. Please help improve this article by introducing more precise citations. (August 2011) (Learn how and when to remove this message)

In population genetics, Ewens's sampling formula describes the probabilities associated with counts of how many different alleles are observed a given number of times in the sample.

Definition

Ewens's sampling formula, introduced by Warren Ewens, states that under certain conditions (specified below), if a random sample of n gametes is taken from a population and classified according to the gene at a particular locus then the probability that there are a1 alleles represented once in the sample, and a2 alleles represented twice, and so on, is

Pr ( a 1 , , a n ; θ ) = n ! θ ( θ + 1 ) ( θ + n 1 ) j = 1 n θ a j j a j a j ! , {\displaystyle \operatorname {Pr} (a_{1},\dots ,a_{n};\theta )={n! \over \theta (\theta +1)\cdots (\theta +n-1)}\prod _{j=1}^{n}{\theta ^{a_{j}} \over j^{a_{j}}a_{j}!},}

for some positive number θ representing the population mutation rate, whenever a 1 , , a n {\displaystyle a_{1},\ldots ,a_{n}} is a sequence of nonnegative integers such that

a 1 + 2 a 2 + 3 a 3 + + n a n = i = 1 n i a i = n . {\displaystyle a_{1}+2a_{2}+3a_{3}+\cdots +na_{n}=\sum _{i=1}^{n}ia_{i}=n.\,}

The phrase "under certain conditions" used above is made precise by the following assumptions:

  • The sample size n is small by comparison to the size of the whole population; and
  • The population is in statistical equilibrium under mutation and genetic drift and the role of selection at the locus in question is negligible; and
  • Every mutant allele is novel. See also: Infinite-alleles model

This is a probability distribution on the set of all partitions of the integer n. Among probabilists and statisticians it is often called the multivariate Ewens distribution.

Mathematical properties

When θ = 0, the probability is 1 that all n genes are the same. When θ = 1, then the distribution is precisely that of the integer partition induced by a uniformly distributed random permutation. As θ → ∞, the probability that no two of the n genes are the same approaches 1.

This family of probability distributions enjoys the property that if after the sample of n is taken, m of the n gametes are chosen without replacement, then the resulting probability distribution on the set of all partitions of the smaller integer m is just what the formula above would give if m were put in place of n.

The Ewens distribution arises naturally from the Chinese restaurant process.

See also

Notes

  • Warren Ewens, "The sampling theory of selectively neutral alleles", Theoretical Population Biology, volume 3, pages 87–112, 1972.
  • H. Crane. (2016) "The Ubiquitous Ewens Sampling Formula", Statistical Science, 31:1 (Feb 2016). This article introduces a series of seven articles about Ewens Sampling in a special issue of the journal.
  • J.F.C. Kingman, "Random partitions in population genetics", Proceedings of the Royal Society of London, Series B, Mathematical and Physical Sciences, volume 361, number 1704, 1978.
  • S. Tavare and W. J. Ewens, "The Multivariate Ewens distribution." (1997, Chapter 41 from the reference below).
  • N.L. Johnson, S. Kotz, and N. Balakrishnan (1997) Discrete Multivariate Distributions, Wiley. ISBN 0-471-12844-9.
Probability distributions (list)
Discrete
univariate
with finite
support
with infinite
support
Continuous
univariate
supported on a
bounded interval
supported on a
semi-infinite
interval
supported
on the whole
real line
with support
whose type varies
Mixed
univariate
continuous-
discrete
Multivariate
(joint)
Directional
Univariate (circular) directional
Circular uniform
Univariate von Mises
Wrapped normal
Wrapped Cauchy
Wrapped exponential
Wrapped asymmetric Laplace
Wrapped Lévy
Bivariate (spherical)
Kent
Bivariate (toroidal)
Bivariate von Mises
Multivariate
von Mises–Fisher
Bingham
Degenerate
and singular
Degenerate
Dirac delta function
Singular
Cantor
Families
Categories: