Limits...
Allele frequencies of variants in ultra conserved elements identify selective pressure on transcription factor binding.

Silla T, Kepp K, Tai ES, Goh L, Davila S, Catela Ivkovic T, Calin GA, Voorhoeve PM - PLoS ONE (2014)

Bottom Line: UCEs association studies for complex human traits can use this information to model expected background variation and thus necessary power for association studies.By combining our data with 1000 Genome Project data, we show in three independent datasets that prevalent UCE variants (MAF>5%) are more often found in relatively less-conserved nucleotides within UCEs, compared to rare variants.Using SNPfold we found no significant influence of RNA secondary structure on UCE conservation.

View Article: PubMed Central - PubMed

Affiliation: Cancer and Stem Cell Biology Program, Duke-NUS Graduate Medical School, Singapore, Singapore.

ABSTRACT
Ultra-conserved genes or elements (UCGs/UCEs) in the human genome are extreme examples of conservation. We characterized natural variations in 2884 UCEs and UCGs in two distinct populations; Singaporean Chinese (n = 280) and Italian (n = 501) by using a pooled sample, targeted capture, sequencing approach. We identify, with high confidence, in these regions the abundance of rare SNVs (MAF<0.5%) of which 75% is not present in dbSNP137. UCEs association studies for complex human traits can use this information to model expected background variation and thus necessary power for association studies. By combining our data with 1000 Genome Project data, we show in three independent datasets that prevalent UCE variants (MAF>5%) are more often found in relatively less-conserved nucleotides within UCEs, compared to rare variants. Moreover, prevalent variants are less likely to overlap transcription factor binding site. Using SNPfold we found no significant influence of RNA secondary structure on UCE conservation. All together, these results suggest UCEs are not under selective pressure as a stretch of DNA but are under differential evolutionary pressure on the single nucleotide level.

Show MeSH

Related in: MedlinePlus

General characterization of SNVs in the UCEs.(A) Number of SNVs per mega base (Mb) of UCE sequence per sample. SNVs from three data sources- Singaporean Chinese cohort (SG-CHN), Italian cohort (ITA) and 1000 Genome Project (1 KG) were used. SNVs are discriminated according to their minor allele frequency (MAF). Numbers in the parentheses represent sample size used in this study (Materials and Methods). Random set represents random genomic regions that have the same total length as the UCEs set. Y-axis represents SNVs per Mb divided by sample count in the analyzed population. (B–D) Shared and distinct SNVs between SG-CHN, ITA and 1 KG populations. Venn diagrams of (B) all, (C) prevalent (MAF>0.5%) and rare (D) (MAF<0.5%) SNVs from three analyzed population. Numbers in the parentheses indicate analyzed SNVs in the corresponding population.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC4219694&req=5

pone-0110692-g001: General characterization of SNVs in the UCEs.(A) Number of SNVs per mega base (Mb) of UCE sequence per sample. SNVs from three data sources- Singaporean Chinese cohort (SG-CHN), Italian cohort (ITA) and 1000 Genome Project (1 KG) were used. SNVs are discriminated according to their minor allele frequency (MAF). Numbers in the parentheses represent sample size used in this study (Materials and Methods). Random set represents random genomic regions that have the same total length as the UCEs set. Y-axis represents SNVs per Mb divided by sample count in the analyzed population. (B–D) Shared and distinct SNVs between SG-CHN, ITA and 1 KG populations. Venn diagrams of (B) all, (C) prevalent (MAF>0.5%) and rare (D) (MAF<0.5%) SNVs from three analyzed population. Numbers in the parentheses indicate analyzed SNVs in the corresponding population.

Mentions: Within the targeted regions from the SG-CHN and ITA cohorts, we detected about 15.1 and 12.4 SNVs per one Mb of targeted region per sample, respectively (Figure 1A). In order to get an additional and independent data set for our study we also extracted all SNVs from the 1000 Genome project (1 KG) phase 1 variants within the defined UCE regions. This identified 13449 SNVs within the targeted regions from 1092 samples, which translates into 8.7 variants per one Mb of targeted region per sample (Figure 1A). From these numbers, it is evident that sequencing of more samples does not lead to a proportional increase of SNVs per sample. This is most likely due to detection saturation of prevalent variants and a relatively low number of private SNVs per sample (Figure 1A). Next, we discriminated SNVs according to their minor allele frequency (MAF) in the respective population. In concordance with previous results [8], [19], analysis of random genomic positions in the 1 KG dataset revealed depletion of prevalent SNVs (MAF>5%) in UCEs (Figure 1A, compare 1 KG to 1 KG random). Comparison of three datasets revealed that almost 1400 (6.5%) of all SNVs are present in all datasets (Figure 1B). A recent study that analyzed 202 protein coding genes in 14002 people revealed an abundance of rare SNVs compared to common variants [38]. Similarly, our study shows that majority of the detected SNVs in conserved non-coding regions are rare variants (MAF<0.5%) (Figure 1A). As expected, the majority (56%) of prevalent SNVs are present in all three datasets (Figure 1C). In contrast, only 27 rare SNVs (0.4%) are shared in all populations (Figure 1D).


Allele frequencies of variants in ultra conserved elements identify selective pressure on transcription factor binding.

Silla T, Kepp K, Tai ES, Goh L, Davila S, Catela Ivkovic T, Calin GA, Voorhoeve PM - PLoS ONE (2014)

General characterization of SNVs in the UCEs.(A) Number of SNVs per mega base (Mb) of UCE sequence per sample. SNVs from three data sources- Singaporean Chinese cohort (SG-CHN), Italian cohort (ITA) and 1000 Genome Project (1 KG) were used. SNVs are discriminated according to their minor allele frequency (MAF). Numbers in the parentheses represent sample size used in this study (Materials and Methods). Random set represents random genomic regions that have the same total length as the UCEs set. Y-axis represents SNVs per Mb divided by sample count in the analyzed population. (B–D) Shared and distinct SNVs between SG-CHN, ITA and 1 KG populations. Venn diagrams of (B) all, (C) prevalent (MAF>0.5%) and rare (D) (MAF<0.5%) SNVs from three analyzed population. Numbers in the parentheses indicate analyzed SNVs in the corresponding population.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC4219694&req=5

pone-0110692-g001: General characterization of SNVs in the UCEs.(A) Number of SNVs per mega base (Mb) of UCE sequence per sample. SNVs from three data sources- Singaporean Chinese cohort (SG-CHN), Italian cohort (ITA) and 1000 Genome Project (1 KG) were used. SNVs are discriminated according to their minor allele frequency (MAF). Numbers in the parentheses represent sample size used in this study (Materials and Methods). Random set represents random genomic regions that have the same total length as the UCEs set. Y-axis represents SNVs per Mb divided by sample count in the analyzed population. (B–D) Shared and distinct SNVs between SG-CHN, ITA and 1 KG populations. Venn diagrams of (B) all, (C) prevalent (MAF>0.5%) and rare (D) (MAF<0.5%) SNVs from three analyzed population. Numbers in the parentheses indicate analyzed SNVs in the corresponding population.
Mentions: Within the targeted regions from the SG-CHN and ITA cohorts, we detected about 15.1 and 12.4 SNVs per one Mb of targeted region per sample, respectively (Figure 1A). In order to get an additional and independent data set for our study we also extracted all SNVs from the 1000 Genome project (1 KG) phase 1 variants within the defined UCE regions. This identified 13449 SNVs within the targeted regions from 1092 samples, which translates into 8.7 variants per one Mb of targeted region per sample (Figure 1A). From these numbers, it is evident that sequencing of more samples does not lead to a proportional increase of SNVs per sample. This is most likely due to detection saturation of prevalent variants and a relatively low number of private SNVs per sample (Figure 1A). Next, we discriminated SNVs according to their minor allele frequency (MAF) in the respective population. In concordance with previous results [8], [19], analysis of random genomic positions in the 1 KG dataset revealed depletion of prevalent SNVs (MAF>5%) in UCEs (Figure 1A, compare 1 KG to 1 KG random). Comparison of three datasets revealed that almost 1400 (6.5%) of all SNVs are present in all datasets (Figure 1B). A recent study that analyzed 202 protein coding genes in 14002 people revealed an abundance of rare SNVs compared to common variants [38]. Similarly, our study shows that majority of the detected SNVs in conserved non-coding regions are rare variants (MAF<0.5%) (Figure 1A). As expected, the majority (56%) of prevalent SNVs are present in all three datasets (Figure 1C). In contrast, only 27 rare SNVs (0.4%) are shared in all populations (Figure 1D).

Bottom Line: UCEs association studies for complex human traits can use this information to model expected background variation and thus necessary power for association studies.By combining our data with 1000 Genome Project data, we show in three independent datasets that prevalent UCE variants (MAF>5%) are more often found in relatively less-conserved nucleotides within UCEs, compared to rare variants.Using SNPfold we found no significant influence of RNA secondary structure on UCE conservation.

View Article: PubMed Central - PubMed

Affiliation: Cancer and Stem Cell Biology Program, Duke-NUS Graduate Medical School, Singapore, Singapore.

ABSTRACT
Ultra-conserved genes or elements (UCGs/UCEs) in the human genome are extreme examples of conservation. We characterized natural variations in 2884 UCEs and UCGs in two distinct populations; Singaporean Chinese (n = 280) and Italian (n = 501) by using a pooled sample, targeted capture, sequencing approach. We identify, with high confidence, in these regions the abundance of rare SNVs (MAF<0.5%) of which 75% is not present in dbSNP137. UCEs association studies for complex human traits can use this information to model expected background variation and thus necessary power for association studies. By combining our data with 1000 Genome Project data, we show in three independent datasets that prevalent UCE variants (MAF>5%) are more often found in relatively less-conserved nucleotides within UCEs, compared to rare variants. Moreover, prevalent variants are less likely to overlap transcription factor binding site. Using SNPfold we found no significant influence of RNA secondary structure on UCE conservation. All together, these results suggest UCEs are not under selective pressure as a stretch of DNA but are under differential evolutionary pressure on the single nucleotide level.

Show MeSH
Related in: MedlinePlus