Limits...
In silico detection of sequence variations modifying transcriptional regulation.

Andersen MC, Engström PG, Lithwick S, Arenillas D, Eriksson P, Lenhard B, Wasserman WW, Odeberg J - PLoS Comput. Biol. (2007)

Bottom Line: Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription.The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs.The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers).

View Article: PubMed Central - PubMed

Affiliation: Department of Gene Technology, School of Biotechnology, AlbaNova University Center, Royal Institute of Technology (KTH), Stockholm, Sweden.

ABSTRACT
Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation.

Show MeSH

Related in: MedlinePlus

Combination of TFBS Analysis and Phylogenetic FootprintingSensitivity of the predictions is plotted versus 1-specificity for phastCons score thresholds of 0, 0.1, 0.2, etc., up to 0.9. The whole range of values is only shown for the red curve; for the other curves, values for phastCons score thresholds 0 and 0.1 are outside the area covered by the plot. The curves correspond to different TFBS score delta thresholds. In the left panel, the relative TFBS score threshold for the best matching allele was 80%, in the right panel the relative TFBS score threshold for the best matching allele was 90%.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2211530&req=5

pcbi-0040005-g005: Combination of TFBS Analysis and Phylogenetic FootprintingSensitivity of the predictions is plotted versus 1-specificity for phastCons score thresholds of 0, 0.1, 0.2, etc., up to 0.9. The whole range of values is only shown for the red curve; for the other curves, values for phastCons score thresholds 0 and 0.1 are outside the area covered by the plot. The curves correspond to different TFBS score delta thresholds. In the left panel, the relative TFBS score threshold for the best matching allele was 80%, in the right panel the relative TFBS score threshold for the best matching allele was 90%.

Mentions: Although the TFBS analysis alone did not provide enrichment of rSNPs relative to the background, we tested if the intersection of TFBS analysis and phylogenetic footprinting could increase the enrichment given by phylogenetic footprinting alone. We counted the number of regulatory and background SNPs that both affected predicted TFBSs and were located in conserved regions (predicted rSNP ), for phastCons score thresholds 0.1 to 0.9 and for TFBS score delta thresholds between one and nine. We also counted the number of predicted rSNPs based on phylogenetic footprinting alone. Figure 5 shows the sensitivity (fraction of predicted rSNPs) versus 1—specificity (fraction of nonpredicted background SNPs) for the different thresholds, where the curves correspond to different TFBS score delta thresholds. When the relative TFBS score threshold for the best matching allele was set to 80% (the left panel), there was virtually no difference in performance between phylogenetic footprinting alone and when TFBS score delta thresholds of less than five was applied, and for larger score delta thresholds the sensitivity was very low. Also when the relative TFBS score threshold for the best matching allele is increased to 90% (the right panel), the application of TFBS analysis provides no enrichment compared with phylogenetic footprinting alone. Variations in particularly high scoring TFBS fail to disrupt sites, thus rSNP predictions for higher scoring TFBS candidates (Figure 5, right panel) are less predictive than those for predicted sites of lower initial scores (Figure 5, left panel).


In silico detection of sequence variations modifying transcriptional regulation.

Andersen MC, Engström PG, Lithwick S, Arenillas D, Eriksson P, Lenhard B, Wasserman WW, Odeberg J - PLoS Comput. Biol. (2007)

Combination of TFBS Analysis and Phylogenetic FootprintingSensitivity of the predictions is plotted versus 1-specificity for phastCons score thresholds of 0, 0.1, 0.2, etc., up to 0.9. The whole range of values is only shown for the red curve; for the other curves, values for phastCons score thresholds 0 and 0.1 are outside the area covered by the plot. The curves correspond to different TFBS score delta thresholds. In the left panel, the relative TFBS score threshold for the best matching allele was 80%, in the right panel the relative TFBS score threshold for the best matching allele was 90%.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2211530&req=5

pcbi-0040005-g005: Combination of TFBS Analysis and Phylogenetic FootprintingSensitivity of the predictions is plotted versus 1-specificity for phastCons score thresholds of 0, 0.1, 0.2, etc., up to 0.9. The whole range of values is only shown for the red curve; for the other curves, values for phastCons score thresholds 0 and 0.1 are outside the area covered by the plot. The curves correspond to different TFBS score delta thresholds. In the left panel, the relative TFBS score threshold for the best matching allele was 80%, in the right panel the relative TFBS score threshold for the best matching allele was 90%.
Mentions: Although the TFBS analysis alone did not provide enrichment of rSNPs relative to the background, we tested if the intersection of TFBS analysis and phylogenetic footprinting could increase the enrichment given by phylogenetic footprinting alone. We counted the number of regulatory and background SNPs that both affected predicted TFBSs and were located in conserved regions (predicted rSNP ), for phastCons score thresholds 0.1 to 0.9 and for TFBS score delta thresholds between one and nine. We also counted the number of predicted rSNPs based on phylogenetic footprinting alone. Figure 5 shows the sensitivity (fraction of predicted rSNPs) versus 1—specificity (fraction of nonpredicted background SNPs) for the different thresholds, where the curves correspond to different TFBS score delta thresholds. When the relative TFBS score threshold for the best matching allele was set to 80% (the left panel), there was virtually no difference in performance between phylogenetic footprinting alone and when TFBS score delta thresholds of less than five was applied, and for larger score delta thresholds the sensitivity was very low. Also when the relative TFBS score threshold for the best matching allele is increased to 90% (the right panel), the application of TFBS analysis provides no enrichment compared with phylogenetic footprinting alone. Variations in particularly high scoring TFBS fail to disrupt sites, thus rSNP predictions for higher scoring TFBS candidates (Figure 5, right panel) are less predictive than those for predicted sites of lower initial scores (Figure 5, left panel).

Bottom Line: Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription.The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs.The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers).

View Article: PubMed Central - PubMed

Affiliation: Department of Gene Technology, School of Biotechnology, AlbaNova University Center, Royal Institute of Technology (KTH), Stockholm, Sweden.

ABSTRACT
Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation.

Show MeSH
Related in: MedlinePlus