Limits...
In silico detection of sequence variations modifying transcriptional regulation.

Andersen MC, Engström PG, Lithwick S, Arenillas D, Eriksson P, Lenhard B, Wasserman WW, Odeberg J - PLoS Comput. Biol. (2007)

Bottom Line: Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription.The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs.The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers).

View Article: PubMed Central - PubMed

Affiliation: Department of Gene Technology, School of Biotechnology, AlbaNova University Center, Royal Institute of Technology (KTH), Stockholm, Sweden.

ABSTRACT
Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation.

Show MeSH

Related in: MedlinePlus

Fractions of Regulatory and Background SNPs in Evolutionary Conserved RegionsSNPs were given the mean phastCons scores from multiple alignments of human, chimp, mouse, rat, dog, chicken, fugu, and zebrafish in windows of 21 bp centered at the SNPs. The fractions of SNPs located within conserved regions were calculated for mean phastCons score thresholds between 0.1 and 0.9. For every threshold a Fisher's exact test was performed to test if there was a significantly different frequency of successes in the regulatory versus the background SNP sets; p-values are indicated above each pair of bars.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2211530&req=5

pcbi-0040005-g003: Fractions of Regulatory and Background SNPs in Evolutionary Conserved RegionsSNPs were given the mean phastCons scores from multiple alignments of human, chimp, mouse, rat, dog, chicken, fugu, and zebrafish in windows of 21 bp centered at the SNPs. The fractions of SNPs located within conserved regions were calculated for mean phastCons score thresholds between 0.1 and 0.9. For every threshold a Fisher's exact test was performed to test if there was a significantly different frequency of successes in the regulatory versus the background SNP sets; p-values are indicated above each pair of bars.

Mentions: We tested the application of phylogenetic footprinting to assess the method's capacity to enrich for bona fide rSNPs. The underlying principle behind phylogenetic footprinting is that functional noncoding elements are more likely to be evolutionarily conserved than nonfunctional surrounding sequence. Restricting the search for regulatory genetic variation to conserved regions is thus likely to increase the enrichment of functional sites. We therefore tested how often the upstream fraction of our experimentally verified regulatory polymorphisms was located within conserved genomic regions. Conservation of the rSNP positions was quantified using the phastCons scores [32] available at the UCSC genome browser from alignments between the May 2004 release of the human genome and chimp, mouse, rat, dog, chicken, fugu, and zebrafish. We performed similar testing for 26,044 background SNPs from dbSNP that are located within 10 kb upstream of human genes with known mouse orthologs. Figure 3 shows that the SNPs with documented effect on gene regulation are more frequently located within evolutionarily conserved sequences relative to background SNPs. For example, when using a phastCons score threshold of 0.4 to define conserved regions, approximately 28% of the rSNPs were retained compared to only 9% of the background SNPs. Significant differences in the frequency of SNPs that fall within conserved regions for the rSNP dataset and the background set were observed for all phastCons score thresholds above 0.1 (Figure 3).


In silico detection of sequence variations modifying transcriptional regulation.

Andersen MC, Engström PG, Lithwick S, Arenillas D, Eriksson P, Lenhard B, Wasserman WW, Odeberg J - PLoS Comput. Biol. (2007)

Fractions of Regulatory and Background SNPs in Evolutionary Conserved RegionsSNPs were given the mean phastCons scores from multiple alignments of human, chimp, mouse, rat, dog, chicken, fugu, and zebrafish in windows of 21 bp centered at the SNPs. The fractions of SNPs located within conserved regions were calculated for mean phastCons score thresholds between 0.1 and 0.9. For every threshold a Fisher's exact test was performed to test if there was a significantly different frequency of successes in the regulatory versus the background SNP sets; p-values are indicated above each pair of bars.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2211530&req=5

pcbi-0040005-g003: Fractions of Regulatory and Background SNPs in Evolutionary Conserved RegionsSNPs were given the mean phastCons scores from multiple alignments of human, chimp, mouse, rat, dog, chicken, fugu, and zebrafish in windows of 21 bp centered at the SNPs. The fractions of SNPs located within conserved regions were calculated for mean phastCons score thresholds between 0.1 and 0.9. For every threshold a Fisher's exact test was performed to test if there was a significantly different frequency of successes in the regulatory versus the background SNP sets; p-values are indicated above each pair of bars.
Mentions: We tested the application of phylogenetic footprinting to assess the method's capacity to enrich for bona fide rSNPs. The underlying principle behind phylogenetic footprinting is that functional noncoding elements are more likely to be evolutionarily conserved than nonfunctional surrounding sequence. Restricting the search for regulatory genetic variation to conserved regions is thus likely to increase the enrichment of functional sites. We therefore tested how often the upstream fraction of our experimentally verified regulatory polymorphisms was located within conserved genomic regions. Conservation of the rSNP positions was quantified using the phastCons scores [32] available at the UCSC genome browser from alignments between the May 2004 release of the human genome and chimp, mouse, rat, dog, chicken, fugu, and zebrafish. We performed similar testing for 26,044 background SNPs from dbSNP that are located within 10 kb upstream of human genes with known mouse orthologs. Figure 3 shows that the SNPs with documented effect on gene regulation are more frequently located within evolutionarily conserved sequences relative to background SNPs. For example, when using a phastCons score threshold of 0.4 to define conserved regions, approximately 28% of the rSNPs were retained compared to only 9% of the background SNPs. Significant differences in the frequency of SNPs that fall within conserved regions for the rSNP dataset and the background set were observed for all phastCons score thresholds above 0.1 (Figure 3).

Bottom Line: Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription.The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs.The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers).

View Article: PubMed Central - PubMed

Affiliation: Department of Gene Technology, School of Biotechnology, AlbaNova University Center, Royal Institute of Technology (KTH), Stockholm, Sweden.

ABSTRACT
Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation.

Show MeSH
Related in: MedlinePlus