Limits...
Evidence for evolutionary and nonevolutionary forces shaping the distribution of human genetic variants near transcription start sites.

Scala G, Affinito O, Miele G, Monticelli A, Cocozza S - PLoS ONE (2014)

Bottom Line: We found that, in this 10 kb region, the distribution of variants depends on their frequency and on their localization relative to the TSS.We found a significant relationship between the distribution of rare variants and nucleosome occupancy scores.In conclusion, this study provides a novel and detailed view of the distribution of genomic variants around TSSs, providing insight into the forces that instigate and maintain variability in such critical regions.

View Article: PubMed Central - PubMed

Affiliation: Gruppo Interdipartimentale di Bioinformatica e Biologia Computazionale, Università degli Studi di Napoli "Federico II", Naples, Italy; Dipartimento di Fisica, Università degli Studi di Napoli "Federico II", Naples, Italy; Istituto Nazionale di Fisica Nucleare, Sezione di Napoli, Naples, Italy.

ABSTRACT
The regions surrounding transcription start sites (TSSs) of genes play a critical role in the regulation of gene expression. At the same time, current evidence indicates that these regions are particularly stressed by transcription-related mutagenic phenomena. In this work we performed a genome-wide analysis of the distribution of single nucleotide polymorphisms (SNPs) inside the 10 kb region flanking human TSSs by dividing SNPs into four classes according to their frequency (rare, two intermediate classes, and common). We found that, in this 10 kb region, the distribution of variants depends on their frequency and on their localization relative to the TSS. We found that the distribution of variants is generally different for TSSs located inside or outside of CpG islands. We found a significant relationship between the distribution of rare variants and nucleosome occupancy scores. Furthermore, our analysis suggests that evolutionary (purifying selection) and nonevolutionary (biased gene conversion) forces both play a role in determining the relative SNP frequency around TSSs. Finally, we analyzed the potential pathogenicity of each class of variant using the Combined Annotation Dependent Depletion score. In conclusion, this study provides a novel and detailed view of the distribution of genomic variants around TSSs, providing insight into the forces that instigate and maintain variability in such critical regions.

Show MeSH
GERP and gBGC correlations for inner and outer regions.Pearson correlations between BGS and BVF values in the outer region (left panel) and between BBS and BVF values in the inner region (right panel) are reported along with corresponding scatter plots for CGI-TSSs. * indicates statistically significant correlations.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4256220&req=5

pone-0114432-g008: GERP and gBGC correlations for inner and outer regions.Pearson correlations between BGS and BVF values in the outer region (left panel) and between BBS and BVF values in the inner region (right panel) are reported along with corresponding scatter plots for CGI-TSSs. * indicates statistically significant correlations.

Mentions: To better quantify such an involved pattern, we considered a generic symmetric window around the TSS and evaluated the correlations of BBS vs. BVF-delta and BGS vs. BVF-delta as a function of the window size. Since the first correlation, as an absolute value, was significantly larger than the second one in proximal (inner) regions, we chose to calculate the correlation of BBS and BVF-delta in the inner region and between BGS and BVF-delta in the complementary one. By using a window-based approach (see Materials and Methods), we were able to split the whole region into an inner one (∼700 bp region flanking the TSS), where BVF-delta is mainly correlated with BBS, and an outer complementary one, where BVF-delta is mainly correlated with BGS. Analysis of the two regions showed a strong positive correlation (rho = 0.77, p-value <2.2 10−16) between BGS and BVF-delta in the external region (Figure 8, left panel) and, conversely, a strong negative correlation (rho = −0.73, p-value  = 5.78 10−6) in the inner region between BVF-delta and BBS (Figure 8, right panel).


Evidence for evolutionary and nonevolutionary forces shaping the distribution of human genetic variants near transcription start sites.

Scala G, Affinito O, Miele G, Monticelli A, Cocozza S - PLoS ONE (2014)

GERP and gBGC correlations for inner and outer regions.Pearson correlations between BGS and BVF values in the outer region (left panel) and between BBS and BVF values in the inner region (right panel) are reported along with corresponding scatter plots for CGI-TSSs. * indicates statistically significant correlations.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4256220&req=5

pone-0114432-g008: GERP and gBGC correlations for inner and outer regions.Pearson correlations between BGS and BVF values in the outer region (left panel) and between BBS and BVF values in the inner region (right panel) are reported along with corresponding scatter plots for CGI-TSSs. * indicates statistically significant correlations.
Mentions: To better quantify such an involved pattern, we considered a generic symmetric window around the TSS and evaluated the correlations of BBS vs. BVF-delta and BGS vs. BVF-delta as a function of the window size. Since the first correlation, as an absolute value, was significantly larger than the second one in proximal (inner) regions, we chose to calculate the correlation of BBS and BVF-delta in the inner region and between BGS and BVF-delta in the complementary one. By using a window-based approach (see Materials and Methods), we were able to split the whole region into an inner one (∼700 bp region flanking the TSS), where BVF-delta is mainly correlated with BBS, and an outer complementary one, where BVF-delta is mainly correlated with BGS. Analysis of the two regions showed a strong positive correlation (rho = 0.77, p-value <2.2 10−16) between BGS and BVF-delta in the external region (Figure 8, left panel) and, conversely, a strong negative correlation (rho = −0.73, p-value  = 5.78 10−6) in the inner region between BVF-delta and BBS (Figure 8, right panel).

Bottom Line: We found that, in this 10 kb region, the distribution of variants depends on their frequency and on their localization relative to the TSS.We found a significant relationship between the distribution of rare variants and nucleosome occupancy scores.In conclusion, this study provides a novel and detailed view of the distribution of genomic variants around TSSs, providing insight into the forces that instigate and maintain variability in such critical regions.

View Article: PubMed Central - PubMed

Affiliation: Gruppo Interdipartimentale di Bioinformatica e Biologia Computazionale, Università degli Studi di Napoli "Federico II", Naples, Italy; Dipartimento di Fisica, Università degli Studi di Napoli "Federico II", Naples, Italy; Istituto Nazionale di Fisica Nucleare, Sezione di Napoli, Naples, Italy.

ABSTRACT
The regions surrounding transcription start sites (TSSs) of genes play a critical role in the regulation of gene expression. At the same time, current evidence indicates that these regions are particularly stressed by transcription-related mutagenic phenomena. In this work we performed a genome-wide analysis of the distribution of single nucleotide polymorphisms (SNPs) inside the 10 kb region flanking human TSSs by dividing SNPs into four classes according to their frequency (rare, two intermediate classes, and common). We found that, in this 10 kb region, the distribution of variants depends on their frequency and on their localization relative to the TSS. We found that the distribution of variants is generally different for TSSs located inside or outside of CpG islands. We found a significant relationship between the distribution of rare variants and nucleosome occupancy scores. Furthermore, our analysis suggests that evolutionary (purifying selection) and nonevolutionary (biased gene conversion) forces both play a role in determining the relative SNP frequency around TSSs. Finally, we analyzed the potential pathogenicity of each class of variant using the Combined Annotation Dependent Depletion score. In conclusion, this study provides a novel and detailed view of the distribution of genomic variants around TSSs, providing insight into the forces that instigate and maintain variability in such critical regions.

Show MeSH