Misplaced Pages

Notation in probability and statistics

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
(Redirected from Notation in probability)

Probability
Statistics

Probability theory and statistics have some commonly used conventions, in addition to standard mathematical notation and mathematical symbols.

Probability theory

This section does not cite any sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (March 2021) (Learn how and when to remove this message)
  • Random variables are usually written in upper case Roman letters, such as X {\textstyle X} or Y {\textstyle Y} and so on. Random variables, in this context, usually refer to something in words, such as "the height of a subject" for a continuous variable, or "the number of cars in the school car park" for a discrete variable, or "the colour of the next bicycle" for a categorical variable. They do not represent a single number or a single category. For instance, if P ( X = x ) {\displaystyle P(X=x)} is written, then it represents the probability that a particular realisation of a random variable (e.g., height, number of cars, or bicycle colour), X, would be equal to a particular value or category (e.g., 1.735 m, 52, or purple), x {\textstyle x} . It is important that X {\textstyle X} and x {\textstyle x} are not confused into meaning the same thing. X {\textstyle X} is an idea, x {\textstyle x} is a value. Clearly they are related, but they do not have identical meanings.
  • Particular realisations of a random variable are written in corresponding lower case letters. For example, x 1 , x 2 , , x n {\textstyle x_{1},x_{2},\ldots ,x_{n}} could be a sample corresponding to the random variable X {\textstyle X} . A cumulative probability is formally written P ( X x ) {\displaystyle P(X\leq x)} to distinguish the random variable from its realization.
  • The probability is sometimes written P {\displaystyle \mathbb {P} } to distinguish it from other functions and measure P to avoid having to define "P is a probability" and P ( X A ) {\displaystyle \mathbb {P} (X\in A)} is short for P ( { ω Ω : X ( ω ) A } ) {\displaystyle P(\{\omega \in \Omega :X(\omega )\in A\})} , where Ω {\displaystyle \Omega } is the event space, X {\displaystyle X} is a random variable that is a function of ω {\displaystyle \omega } (i.e., it depends upon ω {\displaystyle \omega } ), and ω {\displaystyle \omega } is some outcome of interest within the domain specified by Ω {\displaystyle \Omega } (say, a particular height, or a particular colour of a car). Pr ( A ) {\displaystyle \Pr(A)} notation is used alternatively.
  • P ( A B ) {\displaystyle \mathbb {P} (A\cap B)} or P [ B A ] {\displaystyle \mathbb {P} } indicates the probability that events A and B both occur. The joint probability distribution of random variables X and Y is denoted as P ( X , Y ) {\displaystyle P(X,Y)} , while joint probability mass function or probability density function as f ( x , y ) {\displaystyle f(x,y)} and joint cumulative distribution function as F ( x , y ) {\displaystyle F(x,y)} .
  • P ( A B ) {\displaystyle \mathbb {P} (A\cup B)} or P [ B A ] {\displaystyle \mathbb {P} } indicates the probability of either event A or event B occurring ("or" in this case means one or the other or both).
  • σ-algebras are usually written with uppercase calligraphic (e.g. F {\displaystyle {\mathcal {F}}} for the set of sets on which we define the probability P)
  • Probability density functions (pdfs) and probability mass functions are denoted by lowercase letters, e.g. f ( x ) {\displaystyle f(x)} , or f X ( x ) {\displaystyle f_{X}(x)} .
  • Cumulative distribution functions (cdfs) are denoted by uppercase letters, e.g. F ( x ) {\displaystyle F(x)} , or F X ( x ) {\displaystyle F_{X}(x)} .
  • Survival functions or complementary cumulative distribution functions are often denoted by placing an overbar over the symbol for the cumulative: F ¯ ( x ) = 1 F ( x ) {\displaystyle {\overline {F}}(x)=1-F(x)} , or denoted as S ( x ) {\displaystyle S(x)} ,
  • In particular, the pdf of the standard normal distribution is denoted by φ ( z ) {\textstyle \varphi (z)} , and its cdf by Φ ( z ) {\textstyle \Phi (z)} .
  • Some common operators:
  • E [ X ] {\textstyle \mathrm {E} }  : expected value of X
  • var [ X ] {\textstyle \operatorname {var} }  : variance of X
  • cov [ X , Y ] {\textstyle \operatorname {cov} }  : covariance of X and Y
  • X is independent of Y is often written X Y {\displaystyle X\perp Y} or X Y {\displaystyle X\perp \!\!\!\perp Y} , and X is independent of Y given W is often written
X Y | W {\displaystyle X\perp \!\!\!\perp Y\,|\,W} or
X Y | W {\displaystyle X\perp Y\,|\,W}
  • P ( A B ) {\displaystyle \textstyle P(A\mid B)} , the conditional probability, is the probability of A {\displaystyle \textstyle A} given B {\displaystyle \textstyle B}

Statistics

This section does not cite any sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (March 2021) (Learn how and when to remove this message)
  • Greek letters (e.g. θ, β) are commonly used to denote unknown parameters (population parameters).
  • A tilde (~) denotes "has the probability distribution of".
  • Placing a hat, or caret (also known as a circumflex), over a true parameter denotes an estimator of it, e.g., θ ^ {\displaystyle {\widehat {\theta }}} is an estimator for θ {\displaystyle \theta } .
  • The arithmetic mean of a series of values x 1 , x 2 , , x n {\textstyle x_{1},x_{2},\ldots ,x_{n}} is often denoted by placing an "overbar" over the symbol, e.g. x ¯ {\displaystyle {\bar {x}}} , pronounced " x {\textstyle x} bar".
  • Some commonly used symbols for sample statistics are given below:
  • Some commonly used symbols for population parameters are given below:
    • the population mean μ {\textstyle \mu } ,
    • the population variance σ 2 {\textstyle \sigma ^{2}} ,
    • the population standard deviation σ {\textstyle \sigma } ,
    • the population correlation ρ {\textstyle \rho } ,
    • the population cumulants κ r {\textstyle \kappa _{r}} ,
  • x ( k ) {\displaystyle x_{(k)}} is used for the k th {\displaystyle k^{\text{th}}} order statistic, where x ( 1 ) {\displaystyle x_{(1)}} is the sample minimum and x ( n ) {\displaystyle x_{(n)}} is the sample maximum from a total sample size n {\textstyle n} .

Critical values

This section does not cite any sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (March 2021) (Learn how and when to remove this message)

The α-level upper critical value of a probability distribution is the value exceeded with probability α {\textstyle \alpha } , that is, the value x α {\textstyle x_{\alpha }} such that F ( x α ) = 1 α {\textstyle F(x_{\alpha })=1-\alpha } , where F {\textstyle F} is the cumulative distribution function. There are standard notations for the upper critical values of some commonly used distributions in statistics:

  • z α {\textstyle z_{\alpha }} or z ( α ) {\textstyle z(\alpha )} for the standard normal distribution
  • t α , ν {\textstyle t_{\alpha ,\nu }} or t ( α , ν ) {\textstyle t(\alpha ,\nu )} for the t-distribution with ν {\textstyle \nu } degrees of freedom
  • χ α , ν 2 {\displaystyle {\chi _{\alpha ,\nu }}^{2}} or χ 2 ( α , ν ) {\displaystyle {\chi }^{2}(\alpha ,\nu )} for the chi-squared distribution with ν {\textstyle \nu } degrees of freedom
  • F α , ν 1 , ν 2 {\displaystyle F_{\alpha ,\nu _{1},\nu _{2}}} or F ( α , ν 1 , ν 2 ) {\textstyle F(\alpha ,\nu _{1},\nu _{2})} for the F-distribution with ν 1 {\textstyle \nu _{1}} and ν 2 {\textstyle \nu _{2}} degrees of freedom

Linear algebra

This section does not cite any sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (March 2021) (Learn how and when to remove this message)
  • Matrices are usually denoted by boldface capital letters, e.g. A {\textstyle {\mathbf {A}}} .
  • Column vectors are usually denoted by boldface lowercase letters, e.g. x {\textstyle {\mathbf {x}}} .
  • The transpose operator is denoted by either a superscript T (e.g. A T {\textstyle {\mathbf {A}}^{\mathrm {T} }} ) or a prime symbol (e.g. A {\textstyle {\mathbf {A}}'} ).
  • A row vector is written as the transpose of a column vector, e.g. x T {\textstyle {\mathbf {x}}^{\mathrm {T} }} or x {\textstyle {\mathbf {x}}'} .

Abbreviations

This section does not cite any sources. Please help improve this section by adding citations to reliable sources. Unsourced material may be challenged and removed. (March 2021) (Learn how and when to remove this message)

Common abbreviations include:

See also

References

  1. "Calculating Probabilities from Cumulative Distribution Function". 2021-08-09. Retrieved 2024-02-26.
  2. "Probability and stochastic processes", Applied Stochastic Processes, Chapman and Hall/CRC, pp. 9–36, 2013-07-22, doi:10.1201/b15257-3, ISBN 978-0-429-16812-3, retrieved 2023-12-08
  3. "Letters of the Greek Alphabet and Some of Their Statistical Uses". les.appstate.edu/. 1999-02-13. Retrieved 2024-02-26.
  4. "Order Statistics" (PDF). colorado.edu. Retrieved 2024-02-26.
  • Halperin, Max; Hartley, H. O.; Hoel, P. G. (1965), "Recommended Standards for Statistical Symbols and Notation. COPSS Committee on Symbols and Notation", The American Statistician, 19 (3): 12–14, doi:10.2307/2681417, JSTOR 2681417

External links

Common mathematical notation, symbols, and formulas
Lists of Unicode and LaTeX mathematical symbols
Lists of Unicode symbols
General
Alphanumeric
Arrows and Geometric Shapes
Operators
Supplemental Math Operators
Miscellaneous
  • A
  • B
  • Technical
  • ISO 31-11 (Mathematical signs and symbols for use in physical sciences and technology)
Typographical conventions and notations
Language
Letters
Notation
Meanings of symbols
Categories: