Misplaced Pages

Bias (statistics): Difference between revisions

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Browse history interactively← Previous editNext edit →Content deleted Content addedVisualWikitext
Revision as of 23:27, 19 December 2021 editC.Hua Wang (talk | contribs)57 edits Removed "Exclusion bias" (repeated)Tag: Visual edit← Previous edit Revision as of 23:43, 19 December 2021 edit undoC.Hua Wang (talk | contribs)57 edits Classified the type of biasTag: Visual editNext edit →
Line 17: Line 17:


==Types== ==Types==
Statistical bias comes from all stages of data analysis. The following sources of bias will be listed in each stage separately.
A ] is '''biased''' if it is calculated in

such a way that it is systematically different from the ] being estimated. The following lists some types of biases, which can overlap.
'''Data selection'''
*] involves individuals being more likely to be selected for study than others, ]. This can also be termed selection effect, ] and '']''.<ref>{{cite book |last1=Rothman |first1=Kenneth J. |author-link1=Kenneth Rothman (epidemiologist) |first2=Sander |last2=Greenland |author-link2=Sander Greenland |first3=Timothy L. |last3=Lash |year=2008 |title=Modern Epidemiology |publisher=] |pp=134–137 }}</ref> *] involves individuals being more likely to be selected for study than others, ]. This can also be termed selection effect, ] and '']''.<ref>{{cite book |last1=Rothman |first1=Kenneth J. |author-link1=Kenneth Rothman (epidemiologist) |first2=Sander |last2=Greenland |author-link2=Sander Greenland |first3=Timothy L. |last3=Lash |year=2008 |title=Modern Epidemiology |publisher=] |pp=134–137 }}</ref>
**] arises from evaluating diagnostic tests on biased patient samples, leading to an overestimate of the ] of the test. For example, a high prevalence of disease in a study population increases positive predictive values, which will cause a bias between the prediction values and the real ones<ref>{{Cite journal|last=Mulherin|first=Stephanie A.|last2=Miller|first2=William C.|date=2002-10-01|title=Spectrum bias or spectrum effect? Subgroup variation in diagnostic test evaluation|url=https://pubmed.ncbi.nlm.nih.gov/12353947/|journal=Annals of Internal Medicine|volume=137|issue=7|pages=598–602|doi=10.7326/0003-4819-137-7-200210010-00011|issn=1539-3704|pmid=12353947}}</ref>. **] arises from evaluating diagnostic tests on biased patient samples, leading to an overestimate of the ] of the test. For example, a high prevalence of disease in a study population increases positive predictive values, which will cause a bias between the prediction values and the real ones<ref>{{Cite journal|last=Mulherin|first=Stephanie A.|last2=Miller|first2=William C.|date=2002-10-01|title=Spectrum bias or spectrum effect? Subgroup variation in diagnostic test evaluation|url=https://pubmed.ncbi.nlm.nih.gov/12353947/|journal=Annals of Internal Medicine|volume=137|issue=7|pages=598–602|doi=10.7326/0003-4819-137-7-200210010-00011|issn=1539-3704|pmid=12353947}}</ref>.
Line 25: Line 26:
**] may lead to the selection of outcomes, test samples, or test procedures that favor a study's financial sponsor<ref>{{Cite journal|last=Krimsky|first=Sheldon|date=2013-07-01|title=Do Financial Conflicts of Interest Bias Research?: An Inquiry into the “Funding Effect” Hypothesis|url=https://doi.org/10.1177/0162243912456271|journal=Science, Technology, & Human Values|language=en|volume=38|issue=4|pages=566–587|doi=10.1177/0162243912456271|issn=0162-2439}}</ref>. **] may lead to the selection of outcomes, test samples, or test procedures that favor a study's financial sponsor<ref>{{Cite journal|last=Krimsky|first=Sheldon|date=2013-07-01|title=Do Financial Conflicts of Interest Bias Research?: An Inquiry into the “Funding Effect” Hypothesis|url=https://doi.org/10.1177/0162243912456271|journal=Science, Technology, & Human Values|language=en|volume=38|issue=4|pages=566–587|doi=10.1177/0162243912456271|issn=0162-2439}}</ref>.
**] arises due to a loss of participants e.g. loss to follow up during a study.<ref>{{cite book|last1=Higgins|first1=Julian P. T.|url=http://handbook.cochrane.org/chapter_8/8_4_introduction_to_sources_of_bias_in_clinical_trials.htm|title=Cochrane Handbook for Systematic Reviews of Interventions (version 5.1)|last2=Green|first2=Sally|date=March 2011|publisher=The Cochrane Collaboration|editor-last1=Higgins|editor-first1=Julian P. T.|chapter=8. Introduction to sources of bias in clinical trials|author-link1=Julian Higgins|display-editors=etal}}</ref> **] arises due to a loss of participants e.g. loss to follow up during a study.<ref>{{cite book|last1=Higgins|first1=Julian P. T.|url=http://handbook.cochrane.org/chapter_8/8_4_introduction_to_sources_of_bias_in_clinical_trials.htm|title=Cochrane Handbook for Systematic Reviews of Interventions (version 5.1)|last2=Green|first2=Sally|date=March 2011|publisher=The Cochrane Collaboration|editor-last1=Higgins|editor-first1=Julian P. T.|chapter=8. Introduction to sources of bias in clinical trials|author-link1=Julian Higgins|display-editors=etal}}</ref>
**] arises due to differences in the accuracy or completeness of participant recollections of past events. e.g. patients cannot recall how many cigarettes they smoked last week exactly, leading to over-estimation or under-estimation.
* The ] is the difference between an estimator's expected value and the true value of the parameter being estimated.
'''Hypothesis testing'''
*In ], a test is said to be '''unbiased''' if, for some alpha level (between 0 and 1), the probability the null is rejected is less than or equal to the alpha level for the entire parameter space defined by the null hypothesis, while the probability the null is rejected is greater than or equal to the alpha level for the entire parameter space defined by the alternative hypothesis.<ref>{{cite journal|last1=Neyman|first1=Jerzy|author-link1=Jerzy Neyman|last2=Pearson|first2=Egon S.|author-link2=Egon Pearson|year=1936|title=Contributions to the theory of testing statistical hypotheses|url=https://psycnet.apa.org/record/1936-05541-001|journal=Statistical Research Memoirs|volume=1|pages=1–37}}</ref>
'''Estimator selection'''
*The ] is the difference between an estimator's expected value and the true value of the parameter being estimated.
** ] is the bias that appears in estimates of parameters in regression analysis when the assumed specification omits an independent variable that should be in the model. ** ] is the bias that appears in estimates of parameters in regression analysis when the assumed specification omits an independent variable that should be in the model.
'''Analysis methods'''
* In ], a test is said to be '''unbiased''' if, for some alpha level (between 0 and 1), the probability the null is rejected is less than or equal to the alpha level for the entire parameter space defined by the null hypothesis, while the probability the null is rejected is greater than or equal to the alpha level for the entire parameter space defined by the alternative hypothesis.<ref>{{cite journal |last1=Neyman |first1=Jerzy |author-link1=Jerzy Neyman |last2=Pearson |first2=Egon S. |author-link2=Egon Pearson |title=Contributions to the theory of testing statistical hypotheses |journal=Statistical Research Memoirs |year=1936 |volume=1 |pages=1–37 |url=https://psycnet.apa.org/record/1936-05541-001 }}</ref>
* Detection bias occurs when a phenomenon is more likely to be observed for a particular set of study subjects. For instance, the ] involving ] and ] may mean doctors are more likely to look for diabetes in obese patients than in thinner patients, leading to an inflation in diabetes among obese patients because of skewed detection efforts. * Detection bias occurs when a phenomenon is more likely to be observed for a particular set of study subjects. For instance, the ] involving ] and ] may mean doctors are more likely to look for diabetes in obese patients than in thinner patients, leading to an inflation in diabetes among obese patients because of skewed detection efforts.
* In ], bias is defined as "Systematic errors in test content, test administration, and/or scoring procedures that can cause some test takers to get either lower or higher scores than their true ability would merit. The source of the bias is irrelevant to the trait the test is intended to measure."<ref>{{cite web |author=National Council on Measurement in Education (NCME) |author-link=National Council on Measurement in Education |url=http://www.ncme.org/ncme/NCME/Resource_Center/Glossary/NCME/Resource_Center/Glossary1.aspx?hkey=4bb87415-44dc-4088-9ed9-e8515326a061#anchorB<!-- now at https://www.ncme.org/resources/glossary --> |title=NCME Assessment Glossary |archive-url=https://web.archive.org/web/20170722194028/http://www.ncme.org/ncme/NCME/Resource_Center/Glossary/NCME/Resource_Center/Glossary1.aspx?hkey=4bb87415-44dc-4088-9ed9-e8515326a061#anchorB |archive-date=2017-07-22 }}</ref> * In ], bias is defined as "Systematic errors in test content, test administration, and/or scoring procedures that can cause some test takers to get either lower or higher scores than their true ability would merit. The source of the bias is irrelevant to the trait the test is intended to measure."<ref>{{cite web |author=National Council on Measurement in Education (NCME) |author-link=National Council on Measurement in Education |url=http://www.ncme.org/ncme/NCME/Resource_Center/Glossary/NCME/Resource_Center/Glossary1.aspx?hkey=4bb87415-44dc-4088-9ed9-e8515326a061#anchorB<!-- now at https://www.ncme.org/resources/glossary --> |title=NCME Assessment Glossary |archive-url=https://web.archive.org/web/20170722194028/http://www.ncme.org/ncme/NCME/Resource_Center/Glossary/NCME/Resource_Center/Glossary1.aspx?hkey=4bb87415-44dc-4088-9ed9-e8515326a061#anchorB |archive-date=2017-07-22 }}</ref>
*] arises when the researcher subconsciously influences the experiment due to ] where judgment may alter how an experiment is carried out / how results are recorded.
* ] involves a skew in the availability of data, such that observations of a certain kind are more likely to be reported.
'''Interpretation'''
* Analytical bias arises due to the way that the results are evaluated.
*] involves a skew in the availability of data, such that observations of a certain kind are more likely to be reported.
* ] arises due to differences in the accuracy or completeness of participant recollections of past events. e.g. patients cannot recall how many cigarettes they smoked last week exactly, leading to over-estimation or under-estimation.
* ] arises when the researcher subconsciously influences the experiment due to ] where judgment may alter how an experiment is carried out / how results are recorded.


==See also== ==See also==

Revision as of 23:43, 19 December 2021

Situation where the mean of many measurements differs significantly from the actual value
This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
This article's lead section may be too short to adequately summarize the key points. Please consider expanding the lead to provide an accessible overview of all important aspects of the article. (October 2017)
The examples and perspective in this article may not include all significant viewpoints. Please improve the article or discuss the issue. (October 2017) (Learn how and when to remove this message)
This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Bias" statistics – news · newspapers · books · scholar · JSTOR (June 2012) (Learn how and when to remove this message)
(Learn how and when to remove this message)

Statistical bias is a systematic tendency which causes differences between results and facts. The bias exists in numbers of the process of data analysis, including the source of the data, the estimator we chose, and the ways we analyzed the data. Bias may have a serious impact on our results. For example, to investigate the buying habits of the people. If the sample size is not large enough, the results may not be representative of the buying habits of all the people. That is, there may be discrepancies between the survey results and the actual results. Therefore, understanding the source of statistical bias allows us to assess whether our results are close to the real results.

Mathematical Form

Statistical bias is a feature of a statistical technique or of its results whereby the expected value of the results differs from the true underlying quantitative parameter being estimated. The bias of an estimator of a parameter should not be confused with its degree of precision, as the degree of precision is a measure of the sampling error. The bias is defined as follows: let T {\displaystyle T} be a statistic used to estimate a parameter θ {\displaystyle \theta } , and let E ( T ) {\displaystyle \operatorname {E} (T)} denote the expected value of T {\displaystyle T} . Then,

bias ( T , θ ) = bias ( T ) = E ( T ) θ {\displaystyle \operatorname {bias} (T,\theta )=\operatorname {bias} (T)=\operatorname {E} (T)-\theta }

is called the bias of the statistic T {\displaystyle T} (with respect to θ {\displaystyle \theta } ). If bias ( T , θ ) = 0 {\displaystyle \operatorname {bias} (T,\theta )=0} , then T {\displaystyle T} is said to be an unbiased estimator of θ {\displaystyle \theta } ; otherwise, it is said to be a biased estimator of θ {\displaystyle \theta } .

There is no universally-accepted standard notation for the bias; commonly it is denoted by bias {\displaystyle \operatorname {bias} } , Bias {\displaystyle \operatorname {Bias} } or BIAS {\displaystyle \operatorname {BIAS} } . The bias of a statistic T {\displaystyle T} is always relative to the parameter θ {\displaystyle \theta } it is used to estimate, but the parameter θ {\displaystyle \theta } is often omitted when it is clear from the context what is being estimated

Types

Statistical bias comes from all stages of data analysis. The following sources of bias will be listed in each stage separately.

Data selection

  • Selection bias involves individuals being more likely to be selected for study than others, biasing the sample. This can also be termed selection effect, sampling bias and Berksonian bias.
    • Spectrum bias arises from evaluating diagnostic tests on biased patient samples, leading to an overestimate of the sensitivity and specificity of the test. For example, a high prevalence of disease in a study population increases positive predictive values, which will cause a bias between the prediction values and the real ones.
    • Observer selection bias occurs when the evidence presented has been pre-filtered by observers, which is so-called anthropic principle. As the data we collects is not only filter by the design of experiment, but also by the necessary precondition that there must be someone doing a study. An example is the impact of the Earth in the past. The impact event may cause the extinction of intelligent animals, or there were no intelligent animals at that time. Therefore, we have not observed some impact events, but they may have occurred in the past.
    • Volunteer bias occurs when volunteer have intrinsically different characteristics from the target population of the study. Research has shown that volunteers tend to come from families with higher socioeconomic status. Furthermore, another study shows that women are more probable to volunteer for studies than males.
    • Funding bias may lead to the selection of outcomes, test samples, or test procedures that favor a study's financial sponsor.
    • Attrition bias arises due to a loss of participants e.g. loss to follow up during a study.
    • Recall bias arises due to differences in the accuracy or completeness of participant recollections of past events. e.g. patients cannot recall how many cigarettes they smoked last week exactly, leading to over-estimation or under-estimation.

Hypothesis testing

  • In statistical hypothesis testing, a test is said to be unbiased if, for some alpha level (between 0 and 1), the probability the null is rejected is less than or equal to the alpha level for the entire parameter space defined by the null hypothesis, while the probability the null is rejected is greater than or equal to the alpha level for the entire parameter space defined by the alternative hypothesis.

Estimator selection

  • The bias of an estimator is the difference between an estimator's expected value and the true value of the parameter being estimated.
    • Omitted-variable bias is the bias that appears in estimates of parameters in regression analysis when the assumed specification omits an independent variable that should be in the model.

Analysis methods

  • Detection bias occurs when a phenomenon is more likely to be observed for a particular set of study subjects. For instance, the syndemic involving obesity and diabetes may mean doctors are more likely to look for diabetes in obese patients than in thinner patients, leading to an inflation in diabetes among obese patients because of skewed detection efforts.
  • In educational measurement, bias is defined as "Systematic errors in test content, test administration, and/or scoring procedures that can cause some test takers to get either lower or higher scores than their true ability would merit. The source of the bias is irrelevant to the trait the test is intended to measure."
  • Observer bias arises when the researcher subconsciously influences the experiment due to cognitive bias where judgment may alter how an experiment is carried out / how results are recorded.

Interpretation

  • Reporting bias involves a skew in the availability of data, such that observations of a certain kind are more likely to be reported.

See also

References

  1. Rothman, Kenneth J.; Greenland, Sander; Lash, Timothy L. (2008). Modern Epidemiology. Lippincott Williams & Wilkins. pp. 134–137.
  2. Mulherin, Stephanie A.; Miller, William C. (2002-10-01). "Spectrum bias or spectrum effect? Subgroup variation in diagnostic test evaluation". Annals of Internal Medicine. 137 (7): 598–602. doi:10.7326/0003-4819-137-7-200210010-00011. ISSN 1539-3704. PMID 12353947.
  3. Bostrom, Nick (2013-05-31). Anthropic Bias: Observation Selection Effects in Science and Philosophy. New York: Routledge. doi:10.4324/9780203953464/anthropic-bias-nick-bostrom. ISBN 978-0-203-95346-4.
  4. Ćirković, Milan M.; Sandberg, Anders; Bostrom, Nick (2010). "Anthropic Shadow: Observation Selection Effects and Human Extinction Risks". Risk Analysis. 30 (10): 1495–1506. doi:10.1111/j.1539-6924.2010.01460.x. ISSN 1539-6924.
  5. Tripepi, Giovanni; Jager, Kitty J.; Dekker, Friedo W.; Zoccali, Carmine (2010). "Selection Bias and Information Bias in Clinical Research". Nephron Clinical Practice. 115 (2): c94 – c99. doi:10.1159/000312871. ISSN 1660-2110. PMID 20407272.
  6. "Volunteer bias". Catalog of Bias. 2017-11-17. Retrieved 2021-12-18.
  7. Alex, Evans (2020). "Why Do Women Volunteer More Than Men?". Retrieved 2020. {{cite news}}: Check date values in: |access-date= (help)
  8. Krimsky, Sheldon (2013-07-01). "Do Financial Conflicts of Interest Bias Research?: An Inquiry into the "Funding Effect" Hypothesis". Science, Technology, & Human Values. 38 (4): 566–587. doi:10.1177/0162243912456271. ISSN 0162-2439.
  9. Higgins, Julian P. T.; Green, Sally (March 2011). "8. Introduction to sources of bias in clinical trials". In Higgins, Julian P. T.; et al. (eds.). Cochrane Handbook for Systematic Reviews of Interventions (version 5.1). The Cochrane Collaboration.
  10. Neyman, Jerzy; Pearson, Egon S. (1936). "Contributions to the theory of testing statistical hypotheses". Statistical Research Memoirs. 1: 1–37.
  11. National Council on Measurement in Education (NCME). "NCME Assessment Glossary". Archived from the original on 2017-07-22.
Biases
Cognitive biases
Statistical biases
Other biases
Bias reduction
Categories: