Limits...
Evidence for evolutionary and nonevolutionary forces shaping the distribution of human genetic variants near transcription start sites.

Scala G, Affinito O, Miele G, Monticelli A, Cocozza S - PLoS ONE (2014)

Bottom Line: We found that, in this 10 kb region, the distribution of variants depends on their frequency and on their localization relative to the TSS.We found a significant relationship between the distribution of rare variants and nucleosome occupancy scores.In conclusion, this study provides a novel and detailed view of the distribution of genomic variants around TSSs, providing insight into the forces that instigate and maintain variability in such critical regions.

View Article: PubMed Central - PubMed

Affiliation: Gruppo Interdipartimentale di Bioinformatica e Biologia Computazionale, Università degli Studi di Napoli "Federico II", Naples, Italy; Dipartimento di Fisica, Università degli Studi di Napoli "Federico II", Naples, Italy; Istituto Nazionale di Fisica Nucleare, Sezione di Napoli, Naples, Italy.

ABSTRACT
The regions surrounding transcription start sites (TSSs) of genes play a critical role in the regulation of gene expression. At the same time, current evidence indicates that these regions are particularly stressed by transcription-related mutagenic phenomena. In this work we performed a genome-wide analysis of the distribution of single nucleotide polymorphisms (SNPs) inside the 10 kb region flanking human TSSs by dividing SNPs into four classes according to their frequency (rare, two intermediate classes, and common). We found that, in this 10 kb region, the distribution of variants depends on their frequency and on their localization relative to the TSS. We found that the distribution of variants is generally different for TSSs located inside or outside of CpG islands. We found a significant relationship between the distribution of rare variants and nucleosome occupancy scores. Furthermore, our analysis suggests that evolutionary (purifying selection) and nonevolutionary (biased gene conversion) forces both play a role in determining the relative SNP frequency around TSSs. Finally, we analyzed the potential pathogenicity of each class of variant using the Combined Annotation Dependent Depletion score. In conclusion, this study provides a novel and detailed view of the distribution of genomic variants around TSSs, providing insight into the forces that instigate and maintain variability in such critical regions.

Show MeSH
Deleteriousness scores show different position effects among TSS classes.BCS values are plotted on the same region for rare (black line), mid1 (red line), mid2 (green line) and common variants (blue line) for CGI-TSSs (left panel) and nCGI-TSSs (right panel). On the x-axis is the position of the bin relative to the TSS.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4256220&req=5

pone-0114432-g009: Deleteriousness scores show different position effects among TSS classes.BCS values are plotted on the same region for rare (black line), mid1 (red line), mid2 (green line) and common variants (blue line) for CGI-TSSs (left panel) and nCGI-TSSs (right panel). On the x-axis is the position of the bin relative to the TSS.

Mentions: As a last step, we analyzed the potential pathogenicity of each class of variants. We did this by analyzing the CADD (Combined Annotation Dependent Depletion) score [31]. High values of this signal characterize variants that are likely to have deleterious effects, namely its deleteriousness. For both TSS classes and for each of the variant frequency classes we computed the “bin average CADD score” (BCS) obtained by computing, for a fixed bin, the CADD values averaged over bin variants and over TSSs (Figure 9). As expected, we found a statistically significant difference (see Materials and Methods) among the four signals, with SNP deleteriousness values that generally decreased as the frequency of a variant increases. In all considered classes, deleteriousness increased moving toward the TSS from both sides. Finally, for each frequency class, significantly higher values of deleteriousness were seen for CGI-TSSs compared with nCGI-TSSs in the region proximal (∼1300 bp) to the TSS site (see Materials and Methods).


Evidence for evolutionary and nonevolutionary forces shaping the distribution of human genetic variants near transcription start sites.

Scala G, Affinito O, Miele G, Monticelli A, Cocozza S - PLoS ONE (2014)

Deleteriousness scores show different position effects among TSS classes.BCS values are plotted on the same region for rare (black line), mid1 (red line), mid2 (green line) and common variants (blue line) for CGI-TSSs (left panel) and nCGI-TSSs (right panel). On the x-axis is the position of the bin relative to the TSS.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4256220&req=5

pone-0114432-g009: Deleteriousness scores show different position effects among TSS classes.BCS values are plotted on the same region for rare (black line), mid1 (red line), mid2 (green line) and common variants (blue line) for CGI-TSSs (left panel) and nCGI-TSSs (right panel). On the x-axis is the position of the bin relative to the TSS.
Mentions: As a last step, we analyzed the potential pathogenicity of each class of variants. We did this by analyzing the CADD (Combined Annotation Dependent Depletion) score [31]. High values of this signal characterize variants that are likely to have deleterious effects, namely its deleteriousness. For both TSS classes and for each of the variant frequency classes we computed the “bin average CADD score” (BCS) obtained by computing, for a fixed bin, the CADD values averaged over bin variants and over TSSs (Figure 9). As expected, we found a statistically significant difference (see Materials and Methods) among the four signals, with SNP deleteriousness values that generally decreased as the frequency of a variant increases. In all considered classes, deleteriousness increased moving toward the TSS from both sides. Finally, for each frequency class, significantly higher values of deleteriousness were seen for CGI-TSSs compared with nCGI-TSSs in the region proximal (∼1300 bp) to the TSS site (see Materials and Methods).

Bottom Line: We found that, in this 10 kb region, the distribution of variants depends on their frequency and on their localization relative to the TSS.We found a significant relationship between the distribution of rare variants and nucleosome occupancy scores.In conclusion, this study provides a novel and detailed view of the distribution of genomic variants around TSSs, providing insight into the forces that instigate and maintain variability in such critical regions.

View Article: PubMed Central - PubMed

Affiliation: Gruppo Interdipartimentale di Bioinformatica e Biologia Computazionale, Università degli Studi di Napoli "Federico II", Naples, Italy; Dipartimento di Fisica, Università degli Studi di Napoli "Federico II", Naples, Italy; Istituto Nazionale di Fisica Nucleare, Sezione di Napoli, Naples, Italy.

ABSTRACT
The regions surrounding transcription start sites (TSSs) of genes play a critical role in the regulation of gene expression. At the same time, current evidence indicates that these regions are particularly stressed by transcription-related mutagenic phenomena. In this work we performed a genome-wide analysis of the distribution of single nucleotide polymorphisms (SNPs) inside the 10 kb region flanking human TSSs by dividing SNPs into four classes according to their frequency (rare, two intermediate classes, and common). We found that, in this 10 kb region, the distribution of variants depends on their frequency and on their localization relative to the TSS. We found that the distribution of variants is generally different for TSSs located inside or outside of CpG islands. We found a significant relationship between the distribution of rare variants and nucleosome occupancy scores. Furthermore, our analysis suggests that evolutionary (purifying selection) and nonevolutionary (biased gene conversion) forces both play a role in determining the relative SNP frequency around TSSs. Finally, we analyzed the potential pathogenicity of each class of variant using the Combined Annotation Dependent Depletion score. In conclusion, this study provides a novel and detailed view of the distribution of genomic variants around TSSs, providing insight into the forces that instigate and maintain variability in such critical regions.

Show MeSH