Statistical population: Difference between revisions

Browse history interactively ← Previous editContent deleted Content addedVisual WikitextInline

Revision as of 23:22, 28 August 2024 editJavasciptSunrise (talk \| contribs)6 edits Important information.Tags: Visual edit Newcomer task Newcomer task: links ← Previous edit		Latest revision as of 03:32, 4 November 2024 edit undoClueBot NG (talk \| contribs)Bots, Pending changes reviewers, Rollbackers6,439,179 editsm Reverting possible vandalism by 142.167.222.121 to version by Moonsharma123. Report False Positive? Thanks, ClueBot NG. (4355622) (Bot)Tag: Rollback
(85 intermediate revisions by 71 users not shown)
Line 1:		Line 1:
	{{Short description\|Complete set of items that share at least one property in common}}		{{Short description\|Complete set of items that share at least one property in common}}
	{{For\|the number of people\|Population}}		{{For\|the number of people\|Population}}
	A '''~~statistical~~ population''' is a ] of similar items or events which is of interest for some ] or ].<ref>{{Cite web\|title=Glossary of statistical terms: Population\|website=]\|url=http://www.statistics.com/glossary&term_id=812\|access-date=22 February 2016}}</ref> A statistical population can be a group of existing objects (e.g. the set of all ]s within the ] ]) or a ] and potentially ] group of objects conceived as a ] from experience (e.g. the set of all possible hands in a game of ]).<ref>{{MathWorld\|Population}}</ref> A common aim of statistical analysis is to produce ] about some chosen population.<ref>{{cite book \| last1 = Yates \| first1 = Daniel S. \| last2 = Moore \| first2 = David S \| last3 = Starnes \| first3 = Daren S. \| year = 2003 \| title = The Practice of Statistics \| edition = 2nd \| publisher = ] \| location = New York \| url = http://bcs.whfreeman.com/yates2e/ \| isbn = 978-0-7167-4773-4 \| url-status = dead \| archive-url = https://web.archive.org/web/20050209001108/HTTP://bcs.whfreeman.com/yates2e/ \| archive-date = 2005-02-09 }}</ref>		In ], a '''population''' is a ] of similar items or events which is of interest for some question or ].<ref>{{Cite web\|title=Glossary of statistical terms: Population\|website=]\|url=http://www.statistics.com/glossary&term_id=812\|access-date=22 February 2016}}</ref> A statistical population can be a group of existing objects (e.g. the set of all stars within the ]) or a ] and potentially ] group of objects conceived as a generalization from experience (e.g. the set of all possible hands in a game of poker).<ref>{{MathWorld\|Population}}</ref> A common aim of statistical analysis is to produce ] about some chosen population.<ref>{{cite book \| last1 = Yates \| first1 = Daniel S. \| last2 = Moore \| first2 = David S \| last3 = Starnes \| first3 = Daren S. \| year = 2003 \| title = The Practice of Statistics \| edition = 2nd \| publisher = ] \| location = New York \| url = http://bcs.whfreeman.com/yates2e/ \| isbn = 978-0-7167-4773-4 \| url-status = dead \| archive-url = https://web.archive.org/web/20050209001108/HTTP://bcs.whfreeman.com/yates2e/ \| archive-date = 2005-02-09 }}</ref>

	In ], a ] of the population (a statistical '']'') is chosen to represent the population in a statistical analysis.<ref>{{Cite web\|title=Glossary of statistical terms: Sample\|website=]\|url=http://www.statistics.com/glossary&term_id=281\|access-date=22 February 2016}}</ref> Moreover, the statistical sample must be ] and ] model the population (every unit of the population has an equal chance of selection). The ] of the size of this statistical sample to the size of the population is called a '']''. It is then possible to ] the '']s'' using the appropriate ].		In ], a subset of the population (a statistical '']'') is chosen to represent the population in a statistical analysis.<ref>{{Cite web\|title=Glossary of statistical terms: Sample\|website=]\|url=http://www.statistics.com/glossary&term_id=281\|access-date=22 February 2016}}</ref> Moreover, the statistical sample must be ] and ] model the population (every unit of the population has an equal chance of selection). The ratio of the size of this statistical sample to the size of the population is called a '']''. It is then possible to ] the '']s'' using the appropriate ].

	==Mean==		==Mean==
	The '''population mean''', or population ], is a measure of the ] either of a ] or of a ] characterized by that distribution.<ref>{{cite book\|last=Feller\|first=William\|title=Introduction to Probability Theory and its Applications, Vol I\|year=1950\|publisher=Wiley\|isbn=0471257087\|pages=221}}</ref> In a ] of a ] ''X'', the mean is equal to the sum over every possible value weighted by the probability of that value; that is, it is computed by taking the ] of each possible value ''x'' of ''X'' and its probability ''p''(''x''), and then adding all these products together, giving <math>\mu = \sum x p(x)....</math>.<ref>Elementary Statistics by Robert R. Johnson and Patricia J. Kuby, </ref><ref name=":1">{{Cite web\|last=Weisstein\|first=Eric W.\|title=Population Mean\|url=https://mathworld.wolfram.com/PopulationMean.html\|access-date=2020-08-21\|website=mathworld.wolfram.com\|language=en}}</ref> An analogous formula applies to the case of a ]. Not every probability distribution has a defined mean (see the ] for an example). Moreover, the mean can be ] for some distributions.		The '''population mean''', or population ], is a measure of the ] either of a ] or of a ] characterized by that distribution.<ref>{{cite book\|last=Feller\|first=William\|title=Introduction to Probability Theory and its Applications, Vol I\|year=1950\|publisher=Wiley\|isbn=0471257087\|pages=221}}</ref> In a ] of a random variable <math>X</math>, the mean is equal to the sum over every possible value weighted by the probability of that value; that is, it is computed by taking the product of each possible value <math>x</math> of <math>X</math> and its probability <math>p(x)</math>, and then adding all these products together, giving <math>\mu = \sum x \cdot p(x)....</math>.<ref>Elementary Statistics by Robert R. Johnson and Patricia J. Kuby, </ref><ref name=":1">{{Cite web\|last=Weisstein\|first=Eric W.\|title=Population Mean\|url=https://mathworld.wolfram.com/PopulationMean.html\|access-date=2020-08-21\|website=mathworld.wolfram.com\|language=en}}</ref> An analogous formula applies to the case of a ]. Not every probability distribution has a defined mean (see the ] for an example). Moreover, the mean can be infinite for some distributions.

	For a finite population, the population mean of a property is equal to the ] mean of the given property, while considering every member of the population. For example, the population mean height is equal to the ] of the heights of every individual—divided by the total number of individuals. The '']'' may differ from the population mean, especially for small samples. The ] states that the larger the size of the sample, the more likely it is that the sample mean will be close to the population mean.<ref>Schaum's Outline of Theory and Problems of Probability by Seymour Lipschutz and Marc Lipson, </ref>		For a finite population, the population mean of a property is equal to the arithmetic mean of the given property, while considering every member of the population. For example, the population mean height is equal to the sum of the heights of every individual—divided by the total number of individuals. The '']'' may differ from the population mean, especially for small samples. The ] states that the larger the size of the sample, the more likely it is that the sample mean will be close to the population mean.<ref>Schaum's Outline of Theory and Problems of Probability by Seymour Lipschutz and Marc Lipson, </ref>

	==Sub population==
	{{Multiple issues\|section=yes\|August 2024
	{{Underlinked\|section\|date=August 2024}}
	{{Unreferenced section\|date=August 2024}}
	}}
	A subset of a population that shares one or more additional properties is called a ''sub population''. For example, if the population is all ] people, a sub population is all Egyptian ]; if the population is all ] in the world, a sub population is all pharmacies in ]. By contrast, a sample is a subset of a population that is not chosen to share any additional property.

	] may yield different results for different sub populations. For instance, a particular medicine may have different effects on different sub populations, and these effects may be obscured or dismissed if such special sub populations are not identified and examined in isolation.

	Similarly, one can often estimate ] more accurately if one separates out sub populations: the distribution of heights among people is better modeled by considering men and women as separate sub populations, for instance.

	Populations consisting of sub populations can be modeled by ]s, which combine the distributions within sub populations into an overall ]. Even if sub populations are well-modeled by given simple models, the overall population may be poorly fit by a given simple model – poor fit may be evidence for the existence of sub populations. For example, given two equal sub populations, both normally distributed, if they have the same standard ] but different means, the overall distribution will exhibit low ] relative to a single normal distribution – the means of the sub populations fall on the shoulders of the overall distribution. If sufficiently separated, these form a ]; otherwise, it simply has a wide . Further, it will exhibit ] relative to a single normal distribution with the given variation. Alternatively, given two sub populations with the same mean but different standard deviations, the overall population will exhibit high kurtosis, with a sharper peak and ] (and correspondingly shallower shoulders) than a single distribution.

	Analyzing sub populations can help in understanding how certain factors affect different segments of the population, which might not be apparent when looking at the population as a whole. For instance, the ] of a sub population, which can be viewed as a ] of the distribution, may differ significantly from that of the entire population, highlighting unique characteristics of that subgroup.

	The identification and analysis of sub populations can be crucial for making accurate statistical inferences. As noted by Wang et al. (2008)<ref>{{Cite journal \|last=West \|first=Brady T. \|last2=Berglund \|first2=Patricia \|last3=Heeringa \|first3=Steven G. \|date=2008-12 \|title=A Closer Examination of Subpopulation Analysis of Complex-Sample Survey Data \|url=http://journals.sagepub.com/doi/10.1177/1536867X0800800404 \|journal=The Stata Journal: Promoting communications on statistics and Stata \|language=en \|volume=8 \|issue=4 \|pages=520–531 \|doi=10.1177/1536867X0800800404 \|issn=1536-867X}}</ref>, sub population analysis is essential for increasing the precision of estimates and controlling for potential ] that might otherwise distort the results of a study. This approach is particularly important when sub populations exhibit ] characteristics that need to be accounted for in statistical models. Furthermore, Murari et al. (2009) <ref>{{Cite journal \|last=Huang \|first=Shuguang \|date=2010-07 \|title=Statistical Issues in Subpopulation Analysis of High Content Imaging Data \|url=http://www.liebertpub.com/doi/10.1089/cmb.2009.0071 \|journal=Journal of Computational Biology \|language=en \|volume=17 \|issue=7 \|pages=879–894 \|doi=10.1089/cmb.2009.0071 \|issn=1066-5277}}</ref> discuss the application of these concepts in biological modeling, emphasizing the importance of recognizing sub populations to better understand complex ] and improve the accuracy of ].

	== Universe of data ==
	A statistical population is one of the fundamental concepts in ] and ], referring to the complete set of items or events that share a common ], from which ] can be gathered and analyzed. It is often considered the "universe" of data, encompassing all possible subjects of interest—every person, object, or event that fits the criteria being studied. For instance, the population in a study on ] could include every person on ].

	Populations can be finite or infinite. Finite populations are countable, such as the number of students in a school. Infinite populations, on the other hand, are uncountable or so large that they're treated as infinite, like the number of grains of sand on a beach or the number of stars in the universe. When studying a population, researchers can either collect data from every member (a census) or a subset (a sample). ] are more accurate but expensive and time-consuming. ] is more practical but introduces the need for ] to ensure the sample represents the population well.

	The target population is the entire group the researcher is interested in, while the accessible population is the portion that can actually be studied due to practical constraints. For example, a study might target all high school students in a country but only access students from a few schools.

	The concept of a population is crucial for ]. Inferences made from samples are generalized to the population. The goal is to make accurate conclusions about the population based on sample data, often involving probabilities and confidence intervals.

	Populations can change over time. A dynamic population is one where the membership can change, such as the population of a city, which fluctuates due to ], ], and ]. This complicates studies, requiring adjustments in ].

	Populations are described by parameters like the ] (average), ], and ]. These parameters are often unknown and are estimated through ]. For example, the mean income of a country’s population might be estimated from a sample of taxpayers.

	In ], a population refers to a group of ] of the same ] living in a particular ]. Ecologists study population dynamics, including birth rates, death rates, and migration patterns, to understand species survival and ] health.

	In the era of ], the concept of population is evolving. With access to massive ], researchers can sometimes work with entire populations of data (e.g., all Twitter users). This reduces reliance on ], but it also introduces challenges in data ] and ]. When defining and studying populations, especially ], ethical considerations are paramount. Issues like informed consent, privacy, and the potential for bias in selecting samples must be carefully managed to ensure that research is both valid and respectful of participants' rights.

	== Importance of Population ==
	The Australian Government Bureau of Statistics notes:<blockquote>"It is important to understand the target population being studied, so you can understand who or what the data are referring to. If you have not clearly defined who or what you want in your population, you may end up with data that are not useful to you."<ref>{{Cite web \|title=What Is a Population in Statistics? \|url=https://www.thoughtco.com/what-is-a-population-in-statistics-3126308 \|access-date=2024-08-28 \|website=ThoughtCo \|language=en}}</ref></blockquote>

	==See also==		==See also==

Latest revision as of 03:32, 4 November 2024

Complete set of items that share at least one property in common For the number of people, see Population.

In statistics, a population is a set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way galaxy) or a hypothetical and potentially infinite group of objects conceived as a generalization from experience (e.g. the set of all possible hands in a game of poker). A common aim of statistical analysis is to produce information about some chosen population.

In statistical inference, a subset of the population (a statistical sample) is chosen to represent the population in a statistical analysis. Moreover, the statistical sample must be unbiased and accurately model the population (every unit of the population has an equal chance of selection). The ratio of the size of this statistical sample to the size of the population is called a sampling fraction. It is then possible to estimate the population parameters using the appropriate sample statistics.

Mean

The population mean, or population expected value, is a measure of the central tendency either of a probability distribution or of a random variable characterized by that distribution. In a discrete probability distribution of a random variable $X$ , the mean is equal to the sum over every possible value weighted by the probability of that value; that is, it is computed by taking the product of each possible value $x$ of $X$ and its probability $p(x)$ , and then adding all these products together, giving $\mu =\sum x\cdot p(x)....$ . An analogous formula applies to the case of a continuous probability distribution. Not every probability distribution has a defined mean (see the Cauchy distribution for an example). Moreover, the mean can be infinite for some distributions.

For a finite population, the population mean of a property is equal to the arithmetic mean of the given property, while considering every member of the population. For example, the population mean height is equal to the sum of the heights of every individual—divided by the total number of individuals. The sample mean may differ from the population mean, especially for small samples. The law of large numbers states that the larger the size of the sample, the more likely it is that the sample mean will be close to the population mean.

References

"Glossary of statistical terms: Population". Statistics.com. Retrieved 22 February 2016.
Weisstein, Eric W. "Statistical population". MathWorld.
Yates, Daniel S.; Moore, David S; Starnes, Daren S. (2003). The Practice of Statistics (2nd ed.). New York: Freeman. ISBN 978-0-7167-4773-4. Archived from the original on 2005-02-09.
"Glossary of statistical terms: Sample". Statistics.com. Retrieved 22 February 2016.
Feller, William (1950). Introduction to Probability Theory and its Applications, Vol I. Wiley. p. 221. ISBN 0471257087.
Elementary Statistics by Robert R. Johnson and Patricia J. Kuby, p. 279
Weisstein, Eric W. "Population Mean". mathworld.wolfram.com. Retrieved 2020-08-21.
Schaum's Outline of Theory and Problems of Probability by Seymour Lipschutz and Marc Lipson, p. 141

External links

Statistical Terms Made Simple

Statistics

Descriptive statistics

Continuous data

Center	Mean Arithmetic Arithmetic-Geometric Contraharmonic Cubic Generalized/power Geometric Harmonic Heronian Heinz Lehmer Median Mode
Dispersion	Average absolute deviation Coefficient of variation Interquartile range Percentile Range Standard deviation Variance
Shape	Central limit theorem Moments Kurtosis L-moments Skewness

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Effect size Missing data Optimal design Population Replication Sample size determination Statistic Statistical power
Survey methodology	Sampling Cluster Stratified Opinion poll Questionnaire Standard error
Controlled experiments	Blocking Factorial experiment Interaction Random assignment Randomized controlled trial Randomized experiment Scientific control
Adaptive designs	Adaptive clinical trial Stochastic approximation Up-and-down designs
Observational studies	Cohort study Cross-sectional study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in
Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife
Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons
Parametric tests	Likelihood-ratio Score/Lagrange multiplier Wald

Specific tests

Z-test (normal) Student's t-test F-test
Goodness of fit	Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC
Rank statistics	Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra) Van der Waerden test

Bayesian inference

Correlation	Pearson product-moment Partial correlation Confounding variable Coefficient of determination
Regression analysis	Errors and residuals Regression validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)
Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression
Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Homoscedasticity and Heteroscedasticity
Generalized linear model	Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions
Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality
Specific tests	Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey
Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)
Frequency domain	Spectral density estimation Fourier analysis Least-squares spectral analysis Wavelet Whittle likelihood

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time
Hazard function	Nelson–Aalen estimator
Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics
Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification
Social statistics	Actuarial science Census Crime statistics Demography Econometrics Jurimetrics National accounts Official statistics Population statistics Psychometrics
Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category:

Statistical theory

Latest revision as of 03:32, 4 November 2024

Mean

See also

References

External links