Revision as of 07:24, 1 July 2023 editEttrig (talk | contribs)Extended confirmed users, Pending changes reviewers, Rollbackers18,652 edits longer introTag: Reverted← Previous edit | Revision as of 12:57, 20 July 2023 edit undoMako001 (talk | contribs)Extended confirmed users, New page reviewers, Pending changes reviewers, Rollbackers103,332 editsm Reverting LLM-based edits. Feel free to reinstate, but you must take full responsibility for the changes, and ensure that it does not contain subtle factual errors or confusing and ambiguous wording.Tag: RollbackNext edit → | ||
Line 1: | Line 1: | ||
{{Short description|Situation where the mean of many measurements differs significantly from the actual value}} | {{Short description|Situation where the mean of many measurements differs significantly from the actual value}} | ||
{{other uses|Bias (disambiguation)}} | {{other uses|Bias (disambiguation)}} | ||
{{multiple issues| | |||
{{lead too short|date=October 2017}} | |||
{{too few opinions|date=October 2017}} | {{too few opinions|date=October 2017}} | ||
}} | |||
'''Statistical bias''' is a systematic tendency which causes differences between results and facts. The bias exists in numbers of the process of data analysis, including the source of the data, the ] chosen, and the ways the data was analyzed. Bias may have a serious impact on results, for example, to investigate people's buying habits. If the sample size is not large enough, the results may not be representative of the buying habits of all the people. That is, there may be discrepancies between the survey results and reality. Therefore, understanding the source of statistical bias can help to assess whether the observed results are close to actuality. | '''Statistical bias''' is a systematic tendency which causes differences between results and facts. The bias exists in numbers of the process of data analysis, including the source of the data, the ] chosen, and the ways the data was analyzed. Bias may have a serious impact on results, for example, to investigate people's buying habits. If the sample size is not large enough, the results may not be representative of the buying habits of all the people. That is, there may be discrepancies between the survey results and reality. Therefore, understanding the source of statistical bias can help to assess whether the observed results are close to actuality. | ||
Line 8: | Line 11: | ||
Bias does not preclude the existence of any other mistakes. One may have a poorly designed sample, an inaccurate measurement device, and typos in recording data simultaneously. | Bias does not preclude the existence of any other mistakes. One may have a poorly designed sample, an inaccurate measurement device, and typos in recording data simultaneously. | ||
Also it is useful to recognize that the term “error” specifically refers to the outcome rather than the process (errors of rejection or acceptance of the hypothesis being tested). Use of ''flaw'' or ''mistake'' to differentiate procedural errors from these specifically defined outcome-based terms is recommended. | |||
Bias is present at various stages in data analysis, from the source of the data, the estimator selected, to how the data was analyzed. This bias can seriously impact results, as in the case of a study on people's buying habits, where the sample size may not be representative of the entire population's buying habits. It's important to differentiate statistical bias from other errors like instrument inadequacy, lack of data, or transcription errors, as these don't necessarily mean the data is skewed. | |||
The bias of an estimator is a statistical feature where the expected value of the results is different from the true value being estimated. This shouldn't be confused with precision, which refers to the measure of sampling error. | |||
Various types of bias arise at different stages of data analysis. These include selection bias, spectrum bias, observer selection bias, volunteer bias, funding bias, attrition bias, and recall bias. Also, errors in hypothesis testing, like Type I and Type II errors, can lead to incorrect results. | |||
The bias in estimator selection is the discrepancy between the expected value of the estimator and the true value of the parameter being estimated. Even though an unbiased estimator is preferred, sometimes a biased estimator may be more useful due to various reasons such as non-existence of an unbiased estimator, difficulty in computation, or lower mean squared error of a biased estimator. | |||
Bias can also creep in during the analysis methods, leading to detection bias and observer bias. Finally, reporting bias can skew the availability of data, leading to biased interpretations. | |||
== Bias of an estimator == | == Bias of an estimator == |
Revision as of 12:57, 20 July 2023
Situation where the mean of many measurements differs significantly from the actual value For other uses, see Bias (disambiguation).This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
|
Statistical bias is a systematic tendency which causes differences between results and facts. The bias exists in numbers of the process of data analysis, including the source of the data, the estimator chosen, and the ways the data was analyzed. Bias may have a serious impact on results, for example, to investigate people's buying habits. If the sample size is not large enough, the results may not be representative of the buying habits of all the people. That is, there may be discrepancies between the survey results and reality. Therefore, understanding the source of statistical bias can help to assess whether the observed results are close to actuality.
Bias can be differentiated from other mistakes such as accuracy (instrument failure/inadequacy), lack of data, or mistakes in transcription (typos). Bias implies that the data selection may have been skewed by the collection criteria.
Bias does not preclude the existence of any other mistakes. One may have a poorly designed sample, an inaccurate measurement device, and typos in recording data simultaneously.
Also it is useful to recognize that the term “error” specifically refers to the outcome rather than the process (errors of rejection or acceptance of the hypothesis being tested). Use of flaw or mistake to differentiate procedural errors from these specifically defined outcome-based terms is recommended.
Bias of an estimator
Main article: Bias of an estimatorStatistical bias is a feature of a statistical technique or of its results whereby the expected value of the results differs from the true underlying quantitative parameter being estimated. The bias of an estimator of a parameter should not be confused with its degree of precision, as the degree of precision is a measure of the sampling error. The bias is defined as follows: let be a statistic used to estimate a parameter , and let denote the expected value of . Then,
is called the bias of the statistic (with respect to ). If , then is said to be an unbiased estimator of ; otherwise, it is said to be a biased estimator of .
The bias of a statistic is always relative to the parameter it is used to estimate, but the parameter is often omitted when it is clear from the context what is being estimated.
Types
Statistical bias comes from all stages of data analysis. The following sources of bias will be listed in each stage separately.
Data selection
Selection bias involves individuals being more likely to be selected for study than others, biasing the sample. This can also be termed selection effect, sampling bias and Berksonian bias.
- Spectrum bias arises from evaluating diagnostic tests on biased patient samples, leading to an overestimate of the sensitivity and specificity of the test. For example, a high prevalence of disease in a study population increases positive predictive values, which will cause a bias between the prediction values and the real ones.
- Observer selection bias occurs when the evidence presented has been pre-filtered by observers, which is so-called anthropic principle. The data collected is not only filtered by the design of experiment, but also by the necessary precondition that there must be someone doing a study. An example is the impact of the Earth in the past. The impact event may cause the extinction of intelligent animals, or there were no intelligent animals at that time. Therefore, some impact events have not been observed, but they may have occurred in the past.
- Volunteer bias occurs when volunteers have intrinsically different characteristics from the target population of the study. Research has shown that volunteers tend to come from families with higher socioeconomic status. Furthermore, another study shows that women are more probable to volunteer for studies than men.
- Funding bias may lead to the selection of outcomes, test samples, or test procedures that favor a study's financial sponsor.
- Attrition bias arises due to a loss of participants, e.g., loss of follow up during a study.
- Recall bias arises due to differences in the accuracy or completeness of participant recollections of past events; for example, patients cannot recall how many cigarettes they smoked last week exactly, leading to over-estimation or under-estimation.
Hypothesis testing
Type I and type II errors in statistical hypothesis testing leads to wrong results. Type I error happens when the null hypothesis is correct but is rejected. For instance, suppose that the null hypothesis is that if the average driving speed limit ranges from 75 to 85 km/h, it is not considered as speeding. On the other hand, if the average speed is not in that range, it is considered speeding. If someone receives a ticket with an average driving speed of 7 km/h, the decision maker has committed a Type I error. In other words, the average driving speed meets the null hypothesis but is rejected. On the contrary, Type II error happens when the null hypothesis is not correct but is accepted.
Estimator selection
The bias of an estimator is the difference between an estimator's expected value and the true value of the parameter being estimated. Although an unbiased estimator is theoretically preferable to a biased estimator, in practice, biased estimators with small biases are frequently used. A biased estimator may be more useful for several reasons. First, an unbiased estimator may not exist without further assumptions. Second, sometimes an unbiased estimator is hard to compute. Third, a biased estimator may have a lower value of mean squared error.
- A biased estimator is better than any unbiased estimator arising from the Poisson distribution. The value of a biased estimator is always positive and the mean squared error of it is smaller than the unbiased one, which makes the biased estimator be more accurate.
- Omitted-variable bias is the bias that appears in estimates of parameters in regression analysis when the assumed specification omits an independent variable that should be in the model.
Analysis methods
- Detection bias occurs when a phenomenon is more likely to be observed for a particular set of study subjects. For instance, the syndemic involving obesity and diabetes may mean doctors are more likely to look for diabetes in obese patients than in thinner patients, leading to an inflation in diabetes among obese patients because of skewed detection efforts.
- In educational measurement, bias is defined as "Systematic errors in test content, test administration, and/or scoring procedures that can cause some test takers to get either lower or higher scores than their true ability would merit." The source of the bias is irrelevant to the trait the test is intended to measure.
- Observer bias arises when the researcher subconsciously influences the experiment due to cognitive bias where judgment may alter how an experiment is carried out / how results are recorded.
Interpretation
Reporting bias involves a skew in the availability of data, such that observations of a certain kind are more likely to be reported.
See also
References
- Rothman, Kenneth J.; Greenland, Sander; Lash, Timothy L. (2008). Modern Epidemiology. Lippincott Williams & Wilkins. pp. 134–137.
- Mulherin, Stephanie A.; Miller, William C. (2002-10-01). "Spectrum bias or spectrum effect? Subgroup variation in diagnostic test evaluation". Annals of Internal Medicine. 137 (7): 598–602. doi:10.7326/0003-4819-137-7-200210010-00011. ISSN 1539-3704. PMID 12353947. S2CID 35752032.
- Bostrom, Nick (2013-05-31). Anthropic Bias: Observation Selection Effects in Science and Philosophy. New York: Routledge. doi:10.4324/9780203953464. ISBN 978-0-203-95346-4.
- Ćirković, Milan M.; Sandberg, Anders; Bostrom, Nick (2010). "Anthropic Shadow: Observation Selection Effects and Human Extinction Risks". Risk Analysis. 30 (10): 1495–1506. doi:10.1111/j.1539-6924.2010.01460.x. ISSN 1539-6924. PMID 20626690. S2CID 6485564.
- Tripepi, Giovanni; Jager, Kitty J.; Dekker, Friedo W.; Zoccali, Carmine (2010). "Selection Bias and Information Bias in Clinical Research". Nephron Clinical Practice. 115 (2): c94 – c99. doi:10.1159/000312871. ISSN 1660-2110. PMID 20407272. S2CID 18856450.
- "Volunteer bias". Catalog of Bias. 2017-11-17. Retrieved 2021-12-18.
- Alex, Evans (2020). "Why Do Women Volunteer More Than Men?". Retrieved 2021-12-22.
- Krimsky, Sheldon (2013-07-01). "Do Financial Conflicts of Interest Bias Research?: An Inquiry into the "Funding Effect" Hypothesis". Science, Technology, & Human Values. 38 (4): 566–587. doi:10.1177/0162243912456271. ISSN 0162-2439. S2CID 42598982.
- Higgins, Julian P. T.; Green, Sally (March 2011). "8. Introduction to sources of bias in clinical trials". In Higgins, Julian P. T.; et al. (eds.). Cochrane Handbook for Systematic Reviews of Interventions (version 5.1). The Cochrane Collaboration.
- Neyman, Jerzy; Pearson, Egon S. (1936). "Contributions to the theory of testing statistical hypotheses". Statistical Research Memoirs. 1: 1–37.
- Romano, Joseph P.; Siegel, A. F. (1986-06-01). Counterexamples in Probability And Statistics. CRC Press. ISBN 978-0-412-98901-8.
- Hardy, Michael (2003). "An Illuminating Counterexample". The American Mathematical Monthly. 110 (3): 234–238. doi:10.2307/3647938. ISSN 0002-9890. JSTOR 3647938.
- National Council on Measurement in Education (NCME). "NCME Assessment Glossary". Archived from the original on 2017-07-22.