Limits...
Evidence for evolutionary and nonevolutionary forces shaping the distribution of human genetic variants near transcription start sites.

Scala G, Affinito O, Miele G, Monticelli A, Cocozza S - PLoS ONE (2014)

Bottom Line: We found that, in this 10 kb region, the distribution of variants depends on their frequency and on their localization relative to the TSS.We found a significant relationship between the distribution of rare variants and nucleosome occupancy scores.In conclusion, this study provides a novel and detailed view of the distribution of genomic variants around TSSs, providing insight into the forces that instigate and maintain variability in such critical regions.

View Article: PubMed Central - PubMed

Affiliation: Gruppo Interdipartimentale di Bioinformatica e Biologia Computazionale, Università degli Studi di Napoli "Federico II", Naples, Italy; Dipartimento di Fisica, Università degli Studi di Napoli "Federico II", Naples, Italy; Istituto Nazionale di Fisica Nucleare, Sezione di Napoli, Naples, Italy.

ABSTRACT
The regions surrounding transcription start sites (TSSs) of genes play a critical role in the regulation of gene expression. At the same time, current evidence indicates that these regions are particularly stressed by transcription-related mutagenic phenomena. In this work we performed a genome-wide analysis of the distribution of single nucleotide polymorphisms (SNPs) inside the 10 kb region flanking human TSSs by dividing SNPs into four classes according to their frequency (rare, two intermediate classes, and common). We found that, in this 10 kb region, the distribution of variants depends on their frequency and on their localization relative to the TSS. We found that the distribution of variants is generally different for TSSs located inside or outside of CpG islands. We found a significant relationship between the distribution of rare variants and nucleosome occupancy scores. Furthermore, our analysis suggests that evolutionary (purifying selection) and nonevolutionary (biased gene conversion) forces both play a role in determining the relative SNP frequency around TSSs. Finally, we analyzed the potential pathogenicity of each class of variant using the Combined Annotation Dependent Depletion score. In conclusion, this study provides a novel and detailed view of the distribution of genomic variants around TSSs, providing insight into the forces that instigate and maintain variability in such critical regions.

Show MeSH
gBGC score distribution is different between CGI-TSSs and nCGI-TSSs.The BBS values are plotted together for CGI-TSSs (black line) and nCGI-TSSs (red line). On the x-axis is the position of the bin relative to the TSS.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4256220&req=5

pone-0114432-g006: gBGC score distribution is different between CGI-TSSs and nCGI-TSSs.The BBS values are plotted together for CGI-TSSs (black line) and nCGI-TSSs (red line). On the x-axis is the position of the bin relative to the TSS.

Mentions: To identify possible signatures of natural selection, we analyzed the conservation profiles of the analyzed regions by Genomic Evolutionary Rate Profiling (GERP) scores [29]. High values of this score indicate a lower level of substitutions among species (with respect to a neutral value derived by applying a maximum likelihood evolutionary rate estimation), hence indicating a high evolutionary conservation. To evaluate the possible presence of gBGC phenomena, we used “phastBias” gBGC track from UCSC. By using this track we obtained bases predicted to be influenced by GC-biased gene conversion (gBGC bases) [30]. We determined the “bin average GERP score” (BGS) by computing, for a fixed bin, the GERP values averaged over bin loci and over all TSSs (Figure 5). By using an analog process, we obtained the “bin average gBGC score” (BBS) by computing, for a fixed bin, the average number of gBGC bases over all considered TSSs (Figure 6). We found that the BGS distribution was different between CGI-TSSs and nCGI-TSSs (Fisher p-value <10−4). For both TSS classes, we observed a peak in the region ∼100 bp upstream and ∼200 bp downstream of the TSS. A region with negative BGS signal was found 200–700 bp upstream of the CGI-TSS only. Also for the BBS signal we found different distributions for CGI-TSSs and nCGI-TSSs, with a peak in the region from ∼2000 bp upstream to 2000 bp downstream of the CGI-TSSs. Then, we plotted BGS, BBS and BVF-delta values for nCGI-TSSs and CGI-TSSs (Figure S4, Figure S5). For nCGI-TSSs, Figure S4 shows an apparent direct correlation among the three variables. We tested these correlations and we found a strong positive correlation between BVF-delta and BGS values (Pearson correlation coefficient = 0.725, p-value <2.2 10−16) and a weaker correlation (Pearson correlation coefficient 0.557, p-value <2.2 10−16) between BVF-delta and BBS values (Figure 7). For CGI-TSSs, Figure S5 shows a more complex pattern. In particular, an inverse correlation appeared between BVF-delta and both BGS and BBS in the near vicinity to TSSs, whereas a direct correlation for the same signals was present in the complementary distal regions.


Evidence for evolutionary and nonevolutionary forces shaping the distribution of human genetic variants near transcription start sites.

Scala G, Affinito O, Miele G, Monticelli A, Cocozza S - PLoS ONE (2014)

gBGC score distribution is different between CGI-TSSs and nCGI-TSSs.The BBS values are plotted together for CGI-TSSs (black line) and nCGI-TSSs (red line). On the x-axis is the position of the bin relative to the TSS.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4256220&req=5

pone-0114432-g006: gBGC score distribution is different between CGI-TSSs and nCGI-TSSs.The BBS values are plotted together for CGI-TSSs (black line) and nCGI-TSSs (red line). On the x-axis is the position of the bin relative to the TSS.
Mentions: To identify possible signatures of natural selection, we analyzed the conservation profiles of the analyzed regions by Genomic Evolutionary Rate Profiling (GERP) scores [29]. High values of this score indicate a lower level of substitutions among species (with respect to a neutral value derived by applying a maximum likelihood evolutionary rate estimation), hence indicating a high evolutionary conservation. To evaluate the possible presence of gBGC phenomena, we used “phastBias” gBGC track from UCSC. By using this track we obtained bases predicted to be influenced by GC-biased gene conversion (gBGC bases) [30]. We determined the “bin average GERP score” (BGS) by computing, for a fixed bin, the GERP values averaged over bin loci and over all TSSs (Figure 5). By using an analog process, we obtained the “bin average gBGC score” (BBS) by computing, for a fixed bin, the average number of gBGC bases over all considered TSSs (Figure 6). We found that the BGS distribution was different between CGI-TSSs and nCGI-TSSs (Fisher p-value <10−4). For both TSS classes, we observed a peak in the region ∼100 bp upstream and ∼200 bp downstream of the TSS. A region with negative BGS signal was found 200–700 bp upstream of the CGI-TSS only. Also for the BBS signal we found different distributions for CGI-TSSs and nCGI-TSSs, with a peak in the region from ∼2000 bp upstream to 2000 bp downstream of the CGI-TSSs. Then, we plotted BGS, BBS and BVF-delta values for nCGI-TSSs and CGI-TSSs (Figure S4, Figure S5). For nCGI-TSSs, Figure S4 shows an apparent direct correlation among the three variables. We tested these correlations and we found a strong positive correlation between BVF-delta and BGS values (Pearson correlation coefficient = 0.725, p-value <2.2 10−16) and a weaker correlation (Pearson correlation coefficient 0.557, p-value <2.2 10−16) between BVF-delta and BBS values (Figure 7). For CGI-TSSs, Figure S5 shows a more complex pattern. In particular, an inverse correlation appeared between BVF-delta and both BGS and BBS in the near vicinity to TSSs, whereas a direct correlation for the same signals was present in the complementary distal regions.

Bottom Line: We found that, in this 10 kb region, the distribution of variants depends on their frequency and on their localization relative to the TSS.We found a significant relationship between the distribution of rare variants and nucleosome occupancy scores.In conclusion, this study provides a novel and detailed view of the distribution of genomic variants around TSSs, providing insight into the forces that instigate and maintain variability in such critical regions.

View Article: PubMed Central - PubMed

Affiliation: Gruppo Interdipartimentale di Bioinformatica e Biologia Computazionale, Università degli Studi di Napoli "Federico II", Naples, Italy; Dipartimento di Fisica, Università degli Studi di Napoli "Federico II", Naples, Italy; Istituto Nazionale di Fisica Nucleare, Sezione di Napoli, Naples, Italy.

ABSTRACT
The regions surrounding transcription start sites (TSSs) of genes play a critical role in the regulation of gene expression. At the same time, current evidence indicates that these regions are particularly stressed by transcription-related mutagenic phenomena. In this work we performed a genome-wide analysis of the distribution of single nucleotide polymorphisms (SNPs) inside the 10 kb region flanking human TSSs by dividing SNPs into four classes according to their frequency (rare, two intermediate classes, and common). We found that, in this 10 kb region, the distribution of variants depends on their frequency and on their localization relative to the TSS. We found that the distribution of variants is generally different for TSSs located inside or outside of CpG islands. We found a significant relationship between the distribution of rare variants and nucleosome occupancy scores. Furthermore, our analysis suggests that evolutionary (purifying selection) and nonevolutionary (biased gene conversion) forces both play a role in determining the relative SNP frequency around TSSs. Finally, we analyzed the potential pathogenicity of each class of variant using the Combined Annotation Dependent Depletion score. In conclusion, this study provides a novel and detailed view of the distribution of genomic variants around TSSs, providing insight into the forces that instigate and maintain variability in such critical regions.

Show MeSH