Bias (statistics) - Misplaced Pages

This is an old revision of this page, as edited by 104.153.242.141 (talk) at 20:42, 6 March 2023 (→Data selection). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Revision as of 20:42, 6 March 2023 by 104.153.242.141 (talk) (→Data selection)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff) Situation where the mean of many measurements differs significantly from the actual value For other uses, see Bias (disambiguation).

This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)

This article's lead section may be too short to adequately summarize the key points. Please consider expanding the lead to provide an accessible overview of all important aspects of the article. (October 2017)

The examples and perspective in this article may not include all significant viewpoints. Please improve the article or discuss the issue. (October 2017) (Learn how and when to remove this message)

(Learn how and when to remove this message)

Statistical bias is a systematic tendency which causes differences between results and facts. The bias exists in numbers of the process of data analysis, including the source of the data, the estimator chosen, and the ways the data was analyzed. Bias may have a serious impact on results, for example, to investigate people's buying habits. If the sample size is not large enough, the results may not be representative of the buying habits of all the people. That is, there may be discrepancies between the survey results and reality. Therefore, understanding the source of statistical bias can help to assess whether the observed results are close to actuality.

Bias can be differentiated from other mistakes such as accuracy (instrument failure/inadequacy), lack of data, or mistakes in transcription (typos). Bias implies that the data selection may have been skewed by the collection criteria.

Bias does not preclude the existence of any other mistakes. One may have a poorly designed sample, an inaccurate measurement device, and typos in recording data simultaneously.

Also it is useful to recognize that the term “error” specifically refers to the outcome rather than the process (errors of rejection or acceptance of the hypothesis being tested). Use of flaw or mistake to differentiate procedural errors from these specifically defined outcome-based terms is recommended.

Bias of an estimator

Main article: Bias of an estimator

Statistical bias is a feature of a statistical technique or of its results whereby the expected value of the results differs from the true underlying quantitative parameter being estimated. The bias of an estimator of a parameter should not be confused with its degree of precision, as the degree of precision is a measure of the sampling error. The bias is defined as follows: let $T$ be a statistic used to estimate a parameter $\theta$ , and let $\operatorname {E} (T)$ denote the expected value of $T$ . Then,

\operatorname {bias} (T,\theta )=\operatorname {bias} (T)=\operatorname {E} (T)-\theta

is called the bias of the statistic $T$ (with respect to $\theta$ ). If $\operatorname {bias} (T,\theta )=0$ , then $T$ is said to be an unbiased estimator of $\theta$ ; otherwise, it is said to be a biased estimator of $\theta$ .

The bias of a statistic $T$ is always relative to the parameter $\theta$ it is used to estimate, but the parameter $\theta$ is often omitted when it is clear from the context what is being estimated.

Types

Statistical bias comes from all stages of data analysis. The following sources of bias will be listed in each stage separately.

This is Useless don’t read this its not very important

Hypothesis testing

Type I and type II errors in statistical hypothesis testing leads to wrong results. Type I error happens when the null hypothesis is correct but is rejected. For instance, suppose that the null hypothesis is that if the average driving speed limit ranges from 75 to 85 km/h, it is not considered as speeding. On the other hand, if the average speed is not in that range, it is considered speeding. If someone receives a ticket with an average driving speed of 7 km/h, the decision maker has committed a Type I error. In other words, the average driving speed meets the null hypothesis but is rejected. On the contrary, Type II error happens when the null hypothesis is not correct but is accepted.

Estimator selection

The bias of an estimator is the difference between an estimator's expected value and the true value of the parameter being estimated. Although an unbiased estimator is theoretically preferable to a biased estimator, in practice, biased estimators with small biases are frequently used. A biased estimator may be more useful for several reasons. First, an unbiased estimator may not exist without further assumptions. Second, sometimes an unbiased estimator is hard to compute. Third, a biased estimator may have a lower value of mean squared error.

A biased estimator is better than any unbiased estimator arising from the Poisson distribution. The value of a biased estimator is always positive and the mean squared error of it is smaller than the unbiased one, which makes the biased estimator be more accurate.

Omitted-variable bias is the bias that appears in estimates of parameters in regression analysis when the assumed specification omits an independent variable that should be in the model.

Analysis methods

Detection bias occurs when a phenomenon is more likely to be observed for a particular set of study subjects. For instance, the syndemic involving obesity and diabetes may mean doctors are more likely to look for diabetes in obese patients than in thinner patients, leading to an inflation in diabetes among obese patients because of skewed detection efforts.
In educational measurement, bias is defined as "Systematic errors in test content, test administration, and/or scoring procedures that can cause some test takers to get either lower or higher scores than their true ability would merit." The source of the bias is irrelevant to the trait the test is intended to measure.
Observer bias arises when the researcher subconsciously influences the experiment due to cognitive bias where judgment may alter how an experiment is carried out / how results are recorded.

Interpretation

Reporting bias involves a skew in the availability of data, such that observations of a certain kind are more likely to be reported.

References

Neyman, Jerzy; Pearson, Egon S. (1936). "Contributions to the theory of testing statistical hypotheses". Statistical Research Memoirs. 1: 1–37.
Romano, Joseph P.; Siegel, A. F. (1986-06-01). Counterexamples in Probability And Statistics. CRC Press. ISBN 978-0-412-98901-8.
Hardy, Michael (2003). "An Illuminating Counterexample". The American Mathematical Monthly. 110 (3): 234–238. doi:10.2307/3647938. ISSN 0002-9890. JSTOR 3647938.
National Council on Measurement in Education (NCME). "NCME Assessment Glossary". Archived from the original on 2017-07-22.

v t e Biases
Cognitive biases	Acquiescence Ambiguity Affinity Anchoring Attentional Attribution Actor–observer Correspondence Authority Automation Availability Mean world Belief Blind spot Choice-supportive Commitment Confirmation Selective perception Compassion fade Congruence Cultural Declinism Distinction Dunning–Kruger Egocentric Curse of knowledge Emotional Extrinsic incentives Fading affect Framing Frequency Frog pond effect Halo effect Hindsight Horn effect Hostile attribution Impact Implicit In-group Intentionality Illusion of transparency Mean world syndrome Mere-exposure effect Narrative Negativity Normalcy Omission Optimism Out-group homogeneity Outcome Overton window Precision Present Pro-innovation Proximity Response Restraint Self-serving Social comparison Social influence bias Spotlight Status quo Substitution Time-saving Trait ascription Turkey illusion von Restorff effect Zero-risk In animals
Statistical biases	Estimator Forecast Healthy user Information Psychological Lead time Length time Non-response Observer Omitted-variable Participation Recall Sampling Selection Self-selection Social desirability Spectrum Survivorship Systematic error Systemic Verification Wet
Other biases	Academic Basking in reflected glory Déformation professionnelle Funding FUTON Inductive Infrastructure Inherent In education Liking gap Media False balance Vietnam War Norway South Asia Sweden United States Arab–Israeli conflict Ukraine Net Political bias Publication Reporting White hat
Bias reduction	Cognitive bias mitigation Debiasing Heuristics in judgment and decision-making
Lists: General Memory

Categories: