Limits...
In silico detection of sequence variations modifying transcriptional regulation.

Andersen MC, Engström PG, Lithwick S, Arenillas D, Eriksson P, Lenhard B, Wasserman WW, Odeberg J - PLoS Comput. Biol. (2007)

Bottom Line: Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription.The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs.The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers).

View Article: PubMed Central - PubMed

Affiliation: Department of Gene Technology, School of Biotechnology, AlbaNova University Center, Royal Institute of Technology (KTH), Stockholm, Sweden.

ABSTRACT
Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation.

Show MeSH

Related in: MedlinePlus

Distributions of Mean phastCons Scores for SNPs located at Different Distances from the TSSSNPs were given the mean phastCons scores from multiple alignments of human, chimp, mouse, rat, dog, chicken, fugu, and zebrafish in windows of 21 bp centered at the SNPs. For every interval a student's T-test was performed to check if there were significant differences in the distributions of phastCons values for the regulatory and background SNPs; the p-values from these tests are indicated above each pair of boxes.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2211530&req=5

pcbi-0040005-g004: Distributions of Mean phastCons Scores for SNPs located at Different Distances from the TSSSNPs were given the mean phastCons scores from multiple alignments of human, chimp, mouse, rat, dog, chicken, fugu, and zebrafish in windows of 21 bp centered at the SNPs. For every interval a student's T-test was performed to check if there were significant differences in the distributions of phastCons values for the regulatory and background SNPs; the p-values from these tests are indicated above each pair of boxes.

Mentions: Figure 4 shows that for SNPs located from 10 kb to 2 kb upstream, as well as 2 kb to 500 bases upstream of the TSS of the respective genes, there was no significant difference between the phastCons scores for the regulatory and background SNPs. However, for SNPs in the interval from 500 bp upstream to the TSS as well as for the full dataset, the phastCons score values were significantly higher for the regulatory than for the background SNPs (p-values 0.001 and 0.0002, respectively). Also in the interval closest to the TSS the phastCons score, values were higher for the regulatory than for the background SNPs, but the difference was not statistically significant. In the intervals closest to the TSS, the bias in location closer to the TSS for rSNPs is small or eliminated (in the interval from −500 bp to the TSS the median distances to the TSS were 168 bp and 237 bp for the regulatory and background SNPs respectively, and in the interval from 100 bp upstream to the TSS the median distance for both datasets was 51 bases). This suggests that the higher fraction of rSNPs in conserved regions relative to the background is not simply an effect of the rSNPs being located closer to the TSS than the background SNPs.


In silico detection of sequence variations modifying transcriptional regulation.

Andersen MC, Engström PG, Lithwick S, Arenillas D, Eriksson P, Lenhard B, Wasserman WW, Odeberg J - PLoS Comput. Biol. (2007)

Distributions of Mean phastCons Scores for SNPs located at Different Distances from the TSSSNPs were given the mean phastCons scores from multiple alignments of human, chimp, mouse, rat, dog, chicken, fugu, and zebrafish in windows of 21 bp centered at the SNPs. For every interval a student's T-test was performed to check if there were significant differences in the distributions of phastCons values for the regulatory and background SNPs; the p-values from these tests are indicated above each pair of boxes.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2211530&req=5

pcbi-0040005-g004: Distributions of Mean phastCons Scores for SNPs located at Different Distances from the TSSSNPs were given the mean phastCons scores from multiple alignments of human, chimp, mouse, rat, dog, chicken, fugu, and zebrafish in windows of 21 bp centered at the SNPs. For every interval a student's T-test was performed to check if there were significant differences in the distributions of phastCons values for the regulatory and background SNPs; the p-values from these tests are indicated above each pair of boxes.
Mentions: Figure 4 shows that for SNPs located from 10 kb to 2 kb upstream, as well as 2 kb to 500 bases upstream of the TSS of the respective genes, there was no significant difference between the phastCons scores for the regulatory and background SNPs. However, for SNPs in the interval from 500 bp upstream to the TSS as well as for the full dataset, the phastCons score values were significantly higher for the regulatory than for the background SNPs (p-values 0.001 and 0.0002, respectively). Also in the interval closest to the TSS the phastCons score, values were higher for the regulatory than for the background SNPs, but the difference was not statistically significant. In the intervals closest to the TSS, the bias in location closer to the TSS for rSNPs is small or eliminated (in the interval from −500 bp to the TSS the median distances to the TSS were 168 bp and 237 bp for the regulatory and background SNPs respectively, and in the interval from 100 bp upstream to the TSS the median distance for both datasets was 51 bases). This suggests that the higher fraction of rSNPs in conserved regions relative to the background is not simply an effect of the rSNPs being located closer to the TSS than the background SNPs.

Bottom Line: Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription.The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs.The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers).

View Article: PubMed Central - PubMed

Affiliation: Department of Gene Technology, School of Biotechnology, AlbaNova University Center, Royal Institute of Technology (KTH), Stockholm, Sweden.

ABSTRACT
Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation.

Show MeSH
Related in: MedlinePlus