Patterns of variation in DNA segments upstream of transcription start sites.
Bottom Line: On average, we found 9.1 polymorphisms and 8.8 haplotypes per segment with corresponding nucleotide and haplotype diversities of 0.082% and 58%, respectively.Our results suggest that genetic diversity in some of these regions could have been shaped by purifying selection and driven by adaptive changes in the other, thus explaining the relatively large variance in the corresponding genetic diversity indices loci.However, some of these effects could be also due to linkage with surrounding sequences, and the neutralists' explanations cannot be ruled out given uncertainty in the underlying demographic histories and the possibility of random effects due to the small size of the studied segments.
Affiliation: Centre de Recherche, Hôpital Sainte-Justine, Montréal, Quebec, Canada. email@example.comShow MeSH
Mentions: The testing of data in the framework of the infinite sites model can be illustrated by a histogram of allelic frequency classes that regroup sites with the same number of the derived allele, from i=1, 2, 3,…to i=n–1, where n is the number of chromosomes in the sample. The expected distribution is Si(i)=Θ/i [Fan et al., 2002; Fu, 1997] where σ Si=S, as illustrated in the left panels of Figure 1, where Θπ estimates (Table 1) were used to trace the theoretical curve according to the above equation. The corresponding plots for other segments than the three shown in Figure 1, either highlighted by neutrality tests or singled out by Fst statistics, can be found in Supplementary Figure S2. The histogram of allelic frequency classes in Figure 1 shows an excess of low-frequency polymorphisms in the case of CDC25A, as revealed by the negative Tajima's D in this segment (Table 2); it shows a good concordance between theoretical distribution and the data in the CX3XR1 segment and a marked excess of highfrequency-derived alleles in the case of GSTM3. The latter agrees with the result of the Fay and Wu test for this segment (Table 2). Middle histograms in Figure 1 illustrate the results of the haplotype-based tests. In the case of CX3CR1, as for allelic frequency classes, this plot shows an excellent fit between the theoretical distribution and the observed frequencies. In these representations illustrating the results of the neutrality test from Table 2, the CX3CR1 segment appears to conform to a simple neutral model. In contrast, in CDC25A given the number in the expected frequencies do not match with the data. There is anexcess of the observed haplotypes, given their homozygosity (1–G). This discordant distribution, in the case of the CDC25A segment, reflects significant results of haplotype-based tests, including Fu's Fs test, which, however, compares k with its estimate based on Θπ rather than Θhom. After correcting for multiple testing, no segment remained significant for the Ewens-Watterson test as well as for the Fay and Wu test. Furthermore, the significant results of Chakraborty's test for HTR2A and GPX2, as well as those of Fu's Fs test for the GPX2, can likely be ascribed to the effect of recombinations. The latter, causing the number of the observed haplotypes to increase faster than they would simply due to mutation alone, can render the results of the above tests falsely significant. Yet, at the same time, the presence of recombinations renders other tests, such as Tajima's or Fay and Wu's, less conservative, i.e., “more significant” [Fay and Wu, 2000]. Indeed, considering the effect of recombinations (three-to six-fold genomic average) in six segments where more than one recombinant haplotype was observed (Table 1), GPX3 stayed significant for the Fay and Wu test after the correction for multiple testing.
Affiliation: Centre de Recherche, Hôpital Sainte-Justine, Montréal, Quebec, Canada. firstname.lastname@example.org