Limits...
Patterns of variation in DNA segments upstream of transcription start sites.

Labuda D, Labbé C, Langlois S, Lefebvre JF, Freytag V, Moreau C, Sawicki J, Beaulieu P, Pastinen T, Hudson TJ, Sinnett D - Hum. Mutat. (2007)

Bottom Line: On average, we found 9.1 polymorphisms and 8.8 haplotypes per segment with corresponding nucleotide and haplotype diversities of 0.082% and 58%, respectively.Our results suggest that genetic diversity in some of these regions could have been shaped by purifying selection and driven by adaptive changes in the other, thus explaining the relatively large variance in the corresponding genetic diversity indices loci.However, some of these effects could be also due to linkage with surrounding sequences, and the neutralists' explanations cannot be ruled out given uncertainty in the underlying demographic histories and the possibility of random effects due to the small size of the studied segments.

View Article: PubMed Central - PubMed

Affiliation: Centre de Recherche, Hôpital Sainte-Justine, Montréal, Quebec, Canada. damian.labuda@umontreal.ca

Show MeSH

Related in: MedlinePlus

Distributions of allelic frequency classes (left panels) of frequencies of haplotypes [Middleton et al.,1993] and haplotype allelic classes (right) inCDC25A,CX3CR1, andGSRM3. Bars represent the observed values; lines represent theoretical distributions. The occupancy of allelic frequency classes corresponds to counts of sites represented by i new alleles in a sample of n chromosomes (i=1, 2, 3,…, n–1). Here, the theoretical curve (solid line) corresponds to the distribution calculated from the equation [Fan et al., 2002; Fu,1997] Si(i)Θ/i,using Θ/π (Table 1) as the estimator of Θ.The theoretical distribution (solid line) of haplotype frequencies expected given k observed haplotypes (Table 1) is according to Ewens [1972]. Haplotype names are arbitrary and correspond to their names in our database. In the case of haplotype allelic classes, regrouping haplotypes sharing the same number of mutations from the ancestral haplotype, their theoretical occupancy was obtained by coalescent simulation under the standard model, assuming constant population size without (solid line) and with (dotted line) recombination, at 10-fold the genomic average in the case of segments where crossovers were detected.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2683062&req=5

fig01: Distributions of allelic frequency classes (left panels) of frequencies of haplotypes [Middleton et al.,1993] and haplotype allelic classes (right) inCDC25A,CX3CR1, andGSRM3. Bars represent the observed values; lines represent theoretical distributions. The occupancy of allelic frequency classes corresponds to counts of sites represented by i new alleles in a sample of n chromosomes (i=1, 2, 3,…, n–1). Here, the theoretical curve (solid line) corresponds to the distribution calculated from the equation [Fan et al., 2002; Fu,1997] Si(i)Θ/i,using Θ/π (Table 1) as the estimator of Θ.The theoretical distribution (solid line) of haplotype frequencies expected given k observed haplotypes (Table 1) is according to Ewens [1972]. Haplotype names are arbitrary and correspond to their names in our database. In the case of haplotype allelic classes, regrouping haplotypes sharing the same number of mutations from the ancestral haplotype, their theoretical occupancy was obtained by coalescent simulation under the standard model, assuming constant population size without (solid line) and with (dotted line) recombination, at 10-fold the genomic average in the case of segments where crossovers were detected.

Mentions: The testing of data in the framework of the infinite sites model can be illustrated by a histogram of allelic frequency classes that regroup sites with the same number of the derived allele, from i=1, 2, 3,…to i=n–1, where n is the number of chromosomes in the sample. The expected distribution is Si(i)=Θ/i [Fan et al., 2002; Fu, 1997] where σ Si=S, as illustrated in the left panels of Figure 1, where Θπ estimates (Table 1) were used to trace the theoretical curve according to the above equation. The corresponding plots for other segments than the three shown in Figure 1, either highlighted by neutrality tests or singled out by Fst statistics, can be found in Supplementary Figure S2. The histogram of allelic frequency classes in Figure 1 shows an excess of low-frequency polymorphisms in the case of CDC25A, as revealed by the negative Tajima's D in this segment (Table 2); it shows a good concordance between theoretical distribution and the data in the CX3XR1 segment and a marked excess of highfrequency-derived alleles in the case of GSTM3. The latter agrees with the result of the Fay and Wu test for this segment (Table 2). Middle histograms in Figure 1 illustrate the results of the haplotype-based tests. In the case of CX3CR1, as for allelic frequency classes, this plot shows an excellent fit between the theoretical distribution and the observed frequencies. In these representations illustrating the results of the neutrality test from Table 2, the CX3CR1 segment appears to conform to a simple neutral model. In contrast, in CDC25A given the number in the expected frequencies do not match with the data. There is anexcess of the observed haplotypes, given their homozygosity (1–G). This discordant distribution, in the case of the CDC25A segment, reflects significant results of haplotype-based tests, including Fu's Fs test, which, however, compares k with its estimate based on Θπ rather than Θhom. After correcting for multiple testing, no segment remained significant for the Ewens-Watterson test as well as for the Fay and Wu test. Furthermore, the significant results of Chakraborty's test for HTR2A and GPX2, as well as those of Fu's Fs test for the GPX2, can likely be ascribed to the effect of recombinations. The latter, causing the number of the observed haplotypes to increase faster than they would simply due to mutation alone, can render the results of the above tests falsely significant. Yet, at the same time, the presence of recombinations renders other tests, such as Tajima's or Fay and Wu's, less conservative, i.e., “more significant” [Fay and Wu, 2000]. Indeed, considering the effect of recombinations (three-to six-fold genomic average) in six segments where more than one recombinant haplotype was observed (Table 1), GPX3 stayed significant for the Fay and Wu test after the correction for multiple testing.


Patterns of variation in DNA segments upstream of transcription start sites.

Labuda D, Labbé C, Langlois S, Lefebvre JF, Freytag V, Moreau C, Sawicki J, Beaulieu P, Pastinen T, Hudson TJ, Sinnett D - Hum. Mutat. (2007)

Distributions of allelic frequency classes (left panels) of frequencies of haplotypes [Middleton et al.,1993] and haplotype allelic classes (right) inCDC25A,CX3CR1, andGSRM3. Bars represent the observed values; lines represent theoretical distributions. The occupancy of allelic frequency classes corresponds to counts of sites represented by i new alleles in a sample of n chromosomes (i=1, 2, 3,…, n–1). Here, the theoretical curve (solid line) corresponds to the distribution calculated from the equation [Fan et al., 2002; Fu,1997] Si(i)Θ/i,using Θ/π (Table 1) as the estimator of Θ.The theoretical distribution (solid line) of haplotype frequencies expected given k observed haplotypes (Table 1) is according to Ewens [1972]. Haplotype names are arbitrary and correspond to their names in our database. In the case of haplotype allelic classes, regrouping haplotypes sharing the same number of mutations from the ancestral haplotype, their theoretical occupancy was obtained by coalescent simulation under the standard model, assuming constant population size without (solid line) and with (dotted line) recombination, at 10-fold the genomic average in the case of segments where crossovers were detected.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2683062&req=5

fig01: Distributions of allelic frequency classes (left panels) of frequencies of haplotypes [Middleton et al.,1993] and haplotype allelic classes (right) inCDC25A,CX3CR1, andGSRM3. Bars represent the observed values; lines represent theoretical distributions. The occupancy of allelic frequency classes corresponds to counts of sites represented by i new alleles in a sample of n chromosomes (i=1, 2, 3,…, n–1). Here, the theoretical curve (solid line) corresponds to the distribution calculated from the equation [Fan et al., 2002; Fu,1997] Si(i)Θ/i,using Θ/π (Table 1) as the estimator of Θ.The theoretical distribution (solid line) of haplotype frequencies expected given k observed haplotypes (Table 1) is according to Ewens [1972]. Haplotype names are arbitrary and correspond to their names in our database. In the case of haplotype allelic classes, regrouping haplotypes sharing the same number of mutations from the ancestral haplotype, their theoretical occupancy was obtained by coalescent simulation under the standard model, assuming constant population size without (solid line) and with (dotted line) recombination, at 10-fold the genomic average in the case of segments where crossovers were detected.
Mentions: The testing of data in the framework of the infinite sites model can be illustrated by a histogram of allelic frequency classes that regroup sites with the same number of the derived allele, from i=1, 2, 3,…to i=n–1, where n is the number of chromosomes in the sample. The expected distribution is Si(i)=Θ/i [Fan et al., 2002; Fu, 1997] where σ Si=S, as illustrated in the left panels of Figure 1, where Θπ estimates (Table 1) were used to trace the theoretical curve according to the above equation. The corresponding plots for other segments than the three shown in Figure 1, either highlighted by neutrality tests or singled out by Fst statistics, can be found in Supplementary Figure S2. The histogram of allelic frequency classes in Figure 1 shows an excess of low-frequency polymorphisms in the case of CDC25A, as revealed by the negative Tajima's D in this segment (Table 2); it shows a good concordance between theoretical distribution and the data in the CX3XR1 segment and a marked excess of highfrequency-derived alleles in the case of GSTM3. The latter agrees with the result of the Fay and Wu test for this segment (Table 2). Middle histograms in Figure 1 illustrate the results of the haplotype-based tests. In the case of CX3CR1, as for allelic frequency classes, this plot shows an excellent fit between the theoretical distribution and the observed frequencies. In these representations illustrating the results of the neutrality test from Table 2, the CX3CR1 segment appears to conform to a simple neutral model. In contrast, in CDC25A given the number in the expected frequencies do not match with the data. There is anexcess of the observed haplotypes, given their homozygosity (1–G). This discordant distribution, in the case of the CDC25A segment, reflects significant results of haplotype-based tests, including Fu's Fs test, which, however, compares k with its estimate based on Θπ rather than Θhom. After correcting for multiple testing, no segment remained significant for the Ewens-Watterson test as well as for the Fay and Wu test. Furthermore, the significant results of Chakraborty's test for HTR2A and GPX2, as well as those of Fu's Fs test for the GPX2, can likely be ascribed to the effect of recombinations. The latter, causing the number of the observed haplotypes to increase faster than they would simply due to mutation alone, can render the results of the above tests falsely significant. Yet, at the same time, the presence of recombinations renders other tests, such as Tajima's or Fay and Wu's, less conservative, i.e., “more significant” [Fay and Wu, 2000]. Indeed, considering the effect of recombinations (three-to six-fold genomic average) in six segments where more than one recombinant haplotype was observed (Table 1), GPX3 stayed significant for the Fay and Wu test after the correction for multiple testing.

Bottom Line: On average, we found 9.1 polymorphisms and 8.8 haplotypes per segment with corresponding nucleotide and haplotype diversities of 0.082% and 58%, respectively.Our results suggest that genetic diversity in some of these regions could have been shaped by purifying selection and driven by adaptive changes in the other, thus explaining the relatively large variance in the corresponding genetic diversity indices loci.However, some of these effects could be also due to linkage with surrounding sequences, and the neutralists' explanations cannot be ruled out given uncertainty in the underlying demographic histories and the possibility of random effects due to the small size of the studied segments.

View Article: PubMed Central - PubMed

Affiliation: Centre de Recherche, Hôpital Sainte-Justine, Montréal, Quebec, Canada. damian.labuda@umontreal.ca

Show MeSH
Related in: MedlinePlus