User:Carwil/Human genetic clustering: Difference between revisions

< User:Carwil Browse history interactively ← Previous edit Next edit →Content deleted Content addedVisual WikitextInline

Revision as of 13:00, 24 March 2011 editMiradre (talk \| contribs)9,214 edits →References: not on templates anymore← Previous edit		Revision as of 13:24, 24 March 2011 edit undoMiradre (talk \| contribs)9,214 edits unclear, incorrect, poorly written, clustering analysis only one technique, discussion and arguments covered better in Race and genetics articleNext edit →
Line 1:		Line 1:
		⚫	'''Human genetic clustering''' analysis uses mathematical ] infer population genetic structures. A similar analysis can done using ], which in earlier research was a popular method.<ref>Population Structure and Eigenanalysis, Nick Patterson, Alkes L. Price, David Reich, s. PLoS Genet 2(12): e190. doi:10.1371/journal.pgen.0020190</ref> Many of recent studies in the past few years have returned to using principal components analysis. See ] for a discussion of research using these techniques.
	{{science review\|date=October 2010}}
	{{rewrite\|date=October 2010}}
⚫	'''Human genetic clustering''' analysis uses mathematical ] ~~of the degree of similarity of genetic data between individuals and groups to~~ infer population ~~structures and assign individuals to groups that often correspond with their self-identified geographical ancestry. Some, as in the 2003 paper "]", has argued that population~~ structures ~~is evidence for the existence of different human races.~~ A similar analysis can done using ], which in earlier research was a popular method.<ref>Population Structure and Eigenanalysis, Nick Patterson, Alkes L. Price, David Reich, s. PLoS Genet 2(12): e190. doi:10.1371/journal.pgen.0020190</ref> Many of recent studies in the past few years have returned to using principal components analysis.

	==Studies==
	{{main\|Race and genetics}}
	In 2004, Lynn Jorde and Steven Wooding argued that "Analysis of many loci now yields reasonably accurate estimates of genetic similarity among individuals, rather than populations. Clustering of individuals is correlated with geographic origin or ancestry."<ref name="jorde">Lynn B Jorde & Stephen P Wooding, 2004, "Genetic variation, classification and 'race'" in ''Nature Genetics'' 36, S28–S33 </ref>

	] divides a dataset into any prespecified number of clusters.) Individuals have genes from multiple clusters. The cluster prevalent only among the ] people (yellow) only splits off at K=7 and greater.]]
	] et al. 2002, 2005). Individuals from 52 populations were examined at 993 DNA markers. This data was used to partitioned individuals into K = 2, 3, 4, 5, or 6 gene clusters. In this figure, the average fractional membership of individuals from each population is represented by horizontal bars partitioned into K colored segments. Larger image available.]
	A study by ] in 2005 used 326 ] markers and self-identified race/ethnic group (SIRE), white (European American), African-American (black), Asian and Hispanic (individuals involved in the study had to choose from one of these categories), to representing discrete "populations", and showed distinct and non-overlapping clustering of the white, African-American and Asian samples. The results were claimed to confirm the integrity of self-described ancestry: "We have shown a nearly perfect correspondence between genetic cluster and SIRE for major ethnic groups living in the United States, with a discrepancy rate of only 0.14%."(Tang, 2005){{Full}}

	Studies such as those by Risch and ] use a computer program called STRUCTURE to find human populations (gene clusters). It is a statistical program that works by placing individuals into one of two clusters based on their overall genetic similarity, many possible pairs of clusters are tested per individual to generate multiple clusters.<ref name="Witherspoon">"Genetic Similarities Within and Between Human Populations" (2007) by D.J. Witherspoon, S. Wooding, A.R. Rogers, E.E. Marchani, W.S. Watkins, M.A. Batzer and L.B. Jorde. ''Genetics.'' '''176'''(1): 351–359.</ref> These populations are based on multiple genetic markers that are often shared between different human populations even over large geographic ranges. The notion of a genetic cluster is that people within the cluster share on average similar allele frequencies to each other than to those in other clusters. (], 2003 but see also infobox "Multi Locus Allele Clusters") In a test of idealised populations, the computer programme STRUCTURE was found to consistently under-estimate the numbers of populations in the data set when high migration rates between populations and slow mutation rates (such as ]s) were considered.<ref name="population?">Wapples, R., S. and Gaggiotti, O. ''What is a population? An empirical evaluation of some genetic methods for identifying the number of gene pools and their degree of connectivity Molecular Ecology'' (2006) '''15:''' 1419–1439. {{doi\|10.1111/j.1365-294X.2006.02890.x}}</ref>

	Nevertheless the Rosenberg ''et al.'' (2002) paper shows that individuals can be assigned to specific clusters to a high degree of accuracy. One of the underlying questions regarding the distribution of human genetic diversity is related to the degree to which genes are shared between the observed clusters. It has been observed repeatedly that the majority of variation observed in the global human population is found within populations. This variation is usually calculated using ]'s ] (F<sub>ST</sub>), which is an estimate of between to within group variation. The degree of human genetic variation is a little different depending upon the gene type studied, but in general it is common to claim that ~85% of genetic variation is found within groups, ~6–10% between groups within the same continent and ~6–10% is found between continental groups. For example The Human Genome Project states "two random individuals from any one group are almost as different as any two random individuals from the entire world."<ref name="witherspoon">'''' by D. J. Witherspoon, S. Wooding, A. R. Rogers, E. E. Marchani, W. S. Watkins, M. A. Batzer, and L. B. Jorde Genetics. 2007 May; 176(1): 351–359.</ref> Sarich and Miele, however, have argued that estimates of genetic difference between individuals of different populations fail to take into account human diploidity.

	{{quotation\|The point is that we are diploid organisms, getting one set of chromosomes from one parent and a second from the other. To the extent that your mother and father are not especially closely related, then, those two sets of chromosomes will come close to being a random sample of the chromosomes in your population. And the sets present in some randomly chosen member of yours will also be about as different from your two sets as they are from one another. So how much of the variability will be distributed where?

	First is the 15 percent that is interpopulational. The other 85 percent will then split half and half (42.5 percent) between the intra- and interindividual within-population comparisons. The increase in variability in between-population comparisons is thus 15 percent against the 42.5 percent that is between-individual within-population. Thus, 15/42.5 is 32.5 percent, a much more impressive and, more important, more legitimate value than 15 percent.<ref>Sarich VM, Miele F. Race: The Reality of Human Differences. Westview Press (2004). ISBN 0-8133-4086-1</ref>}}

	Additionally, Edwards (2003) claims in his essay "]" that: "It is not true, as ''Nature'' claimed, that 'two random individuals from any one group are almost as different as any two random individuals from the entire world'" and Risch ''et al.'' (2002) state "Two Caucasians are more similar to each other genetically than a Caucasian and an Asian." It should be noted that these statements are not the same. Risch ''et al.'' simply state that two ] individuals from the same geographical region are more similar to each other than either is to an indigenous individual from a different geographical region, a claim few would argue with. Jorde et al. put it like this:

	{{quotation\|The picture that begins to emerge from this and other analyses of human genetic variation is that variation tends to be geographically structured, such that most individuals from the same geographic region will be more similar to one another than to individuals from a distant region.<ref name="jorde"/>}}

	Whereas Edwards claims that it is not true that the differences between individuals from different geographical regions represent only a small proportion of the variation within the human population (he claims that within group differences between individuals are not almost as large as between group differences). Bamshad ''et al.'' (2004) used the data from Rosenberg ''et al.'' (2002) to investigate the extent of genetic differences between individuals within continental groups relative to genetic differences between individuals between continental groups. They found that though these individuals could be classified very accurately to continental clusters, there was a significant degree of genetic overlap on the individual level, to the extent that, using 377 loci, individual Europeans were about 38% of the time more genetically similar to East Asians than to other Europeans.

	{{Infobox multi locus allele clusters}}

	The existence of allelic clines and the observation that the bulk of human variation is continuously distributed, has led some scientists to conclude that any categorization schema attempting to partition that variation meaningfully will necessarily create artificial truncations. (Kittles & Weiss 2003). It is for this reason, Reanne Frank argues, that attempts to allocate individuals into ancestry groupings based on genetic information have yielded varying results that are highly dependent on methodological design.<ref name="Frank"></ref> Serre and Pääbo (2004) make a similar claim:{{quotation\|The absence of strong continental clustering in the human gene pool is of practical importance. It has recently been claimed that “the greatest genetic structure that exists in the human population occurs at the racial level” (Risch et al. 2002). Our results show that this is not the case, and we see no reason to assume that “races” represent any units of relevance for understanding human genetic history.}}

	In a response to Serre and Pääbo (2004), Rosenberg ''et al.'' (2005) make three relevant observations. Firstly they maintain that their clustering analysis is robust. Secondly they agree with Serre and Pääbo that membership of multiple clusters can be interpreted as evidence for clinality (isolation by distance), though they also comment that this may also be due to admixture between neighbouring groups (small island model). Thirdly they comment that evidence of clusterdness is not evidence for any concepts of "biological race".<ref name="rosenberg2005">''Rosenberg NA, Mahajan S, Ramachandran S, Zhao C, Pritchard JK, ''et al.'' (2005) ''Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure. PLoS Genet'' '''1(6)''': e70 {{doi\|10.1371/journal.pgen.0010070}}</ref>

	Risch ''et al.'' (2002) state that "two Caucasians are more similar to each other genetically than a Caucasian and an Asian", but Bamshad ''et al.'' (2004)<ref name="bamshad 2004">Bamshad, Wooding, Salisbury§ and Stephens (2004) ''Deconstructing the relationship between genetics and race. Nature Reviews Genetics'' '''8''':598–609. {{doi\|10.1038/nrg1401}}</ref> used the same data set as Rosenberg ''et al.'' (2002) to show that Europeans are more similar to Asians 38% of the time than they are to other Europeans when only 377 microsatellite markers are analysed.

	]
	{\| class="wikitable" style="text-align:center"
	\|+ Percentage similarity between two individuals from different clusters when 377 microsatellite markers are considered.<ref>The table gives the percentage likelihood that two individuals from different clusters are genetically more similar to each other than to someone from their own population when 377 microsatellite markers are considered from Bamshad ''et al.'' (2004){{doi\|10.1038/nrg1401}}, original data from Rosenberg (2002).</ref>
	\|-
	! x !! Africans !! Europeans !! Asians
	\|-
	! Europeans
	\| 36.5 \|\| — \|\| —
	\|-
	! Asians
	\| 35.5 \|\| 38.3 \|\| —
	\|-
	! Indigenous Americans
	\| 26.1 \|\| 33.4 \|\| 35
	\|}

	In agreement with the observation of Bamshad ''et al.'' (2004), Witherspoon ''et al.'' (2007) have shown that many more than 326 or 377 microsatellite loci are required in order to show that individuals are always more similar to individuals in their own population group than to individuals in different population groups, even for three distinct populations.<ref name="witherspoon"/>

	Witherspoon et al. (2007) have argued that even when individuals can be reliably assigned to specific population groups, it may still be possible for two randomly chosen individuals from different populations/clusters to be more similar to each other than to a randomly chosen member of their own cluster. They found that many thousands of genetic markers had to be used in order for the answer to the question "How often is a pair of individuals from one population genetically more dissimilar than two individuals chosen from two different populations?" to be "never". This assumed three population groups separated by large geographic ranges (European, African and East Asian). The entire world population is much more complex and studying an increasing number of groups would require an increasing number of markers for the same answer. Witherspoon et al. conclude that "caution should be used when using geographic or genetic ancestry to make inferences about individual phenotypes."

	Clustering does not particularly correspond to continental divisions. Depending on the parameters given to their analytical program, Rosenberg and Pritchard were able to construct between divisions of between 4 and 20 clusters of the genomes studied, although they excluded analysis with more than 6 clusters from their published article. Probability values for various cluster configurations varied widely, with the single most likely configuration coming with 16 clusters although other 16-cluster configurations had low probabilities. Overall, "there is no clear evidence that K=6 was the best estimate" according to geneticist Deborah Bolnick (2008:76-77).<ref>{{Cite book \| publisher = Rutgers University Press \| isbn = 9780813543246 \| editor1-last = Koenig \| editor1-first = Barbara A. \| editor2-last = Richardson \| editor2-first = Sarah S. \| editor3-last = Lee \| editor3-first = Sandra Soo-Jin \| last = Bolnick \| first = Deborah A. \| title = Revisiting race in a genomic age \| chapter = Individual Ancestry Inference and the Reification of Race as a Biological Phenomenon \| year = 2008 \| postscript = . }}</ref>

	==See also==
	* ]
	* ]
	* ]

	==References==		==References==
	{{Reflist}}		{{Reflist}}

	{{Use dmy dates\|date=October 2010}}

	{{DEFAULTSORT:Human Genetic Clustering}}
	]
	]
	]

Revision as of 13:24, 24 March 2011

Human genetic clustering analysis uses mathematical cluster analysis infer population genetic structures. A similar analysis can done using principal components analysis, which in earlier research was a popular method. Many of recent studies in the past few years have returned to using principal components analysis. See race and genetics for a discussion of research using these techniques.

References

Population Structure and Eigenanalysis, Nick Patterson, Alkes L. Price, David Reich, s. PLoS Genet 2(12): e190. doi:10.1371/journal.pgen.0020190