Limits...
In silico detection of sequence variations modifying transcriptional regulation.

Andersen MC, Engström PG, Lithwick S, Arenillas D, Eriksson P, Lenhard B, Wasserman WW, Odeberg J - PLoS Comput. Biol. (2007)

Bottom Line: Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription.The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs.The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers).

View Article: PubMed Central - PubMed

Affiliation: Department of Gene Technology, School of Biotechnology, AlbaNova University Center, Royal Institute of Technology (KTH), Stockholm, Sweden.

ABSTRACT
Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation.

Show MeSH

Related in: MedlinePlus

Impact on TFBS Score of Mutations Inserted into Synthesized TFBS SequencesThe boxes correspond to score deltas for (from left to right) 1 bp substitutions, 2 bp substitutions at adjacent positions, two randomly placed 1 bp substitutions, 3 bp substitutions both in adjacent and at random positions, four randomly placed base pair substitutions, five randomly placed substitutions, one randomly placed 1 bp insertion, and one randomly placed 1 bp deletion.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC2211530&req=5

pcbi-0040005-g001: Impact on TFBS Score of Mutations Inserted into Synthesized TFBS SequencesThe boxes correspond to score deltas for (from left to right) 1 bp substitutions, 2 bp substitutions at adjacent positions, two randomly placed 1 bp substitutions, 3 bp substitutions both in adjacent and at random positions, four randomly placed base pair substitutions, five randomly placed substitutions, one randomly placed 1 bp insertion, and one randomly placed 1 bp deletion.

Mentions: Stormo and colleagues have shown that PWM representations of TFBSs gives scores that are proportional to the binding energy between the DNA sequence and the protein [30] (see Figure S1A and S1B for a demonstration of two examples where this holds true). This implies that matrix models of TFBSs can be used to estimate the effect on the transcription factor binding affinity of genetic variation in a regulatory region. Our in silico prediction tool identifies polymorphisms for which the assigned score to a TFBS model differs between the two allelic sequences. To define the expected ranges of allelic differences in TFBS scores for various types of mutations, we generated panels of simulated binding sites based on the distribution of bases at each position of the TFBS frequency matrices in the JASPAR database [31], representing wild-type TFBS sequences. We then inserted mutations into the generated sequences to produce a collection of synthetic regulatory sequence variants. Various types of mutations, including 1–5 bp substitutions, 1 bp insertions, and 1 bp deletions were introduced into the generated sequences; see Methods for a detailed description of the mutation classes. For every generated sequence, we computed a TFBS score delta by comparing the TFBS score for the wild-type allele (the sequence generated from the frequency matrix of the TFBS) with the one obtained for the mutated allele. For an explanation of the scoring system and its scale, see [24]. The box plot in Figure 1 shows that the majority of the simple 1 bp substitutions gives score deltas below four, and mutation of additional bases gives higher score deltas. Insertion and deletion polymorphisms behave similarly to each other, with approximately the same median value of the score deltas as for the 1 bp substitutions, but with a higher value of the third quartile and more outliers.


In silico detection of sequence variations modifying transcriptional regulation.

Andersen MC, Engström PG, Lithwick S, Arenillas D, Eriksson P, Lenhard B, Wasserman WW, Odeberg J - PLoS Comput. Biol. (2007)

Impact on TFBS Score of Mutations Inserted into Synthesized TFBS SequencesThe boxes correspond to score deltas for (from left to right) 1 bp substitutions, 2 bp substitutions at adjacent positions, two randomly placed 1 bp substitutions, 3 bp substitutions both in adjacent and at random positions, four randomly placed base pair substitutions, five randomly placed substitutions, one randomly placed 1 bp insertion, and one randomly placed 1 bp deletion.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC2211530&req=5

pcbi-0040005-g001: Impact on TFBS Score of Mutations Inserted into Synthesized TFBS SequencesThe boxes correspond to score deltas for (from left to right) 1 bp substitutions, 2 bp substitutions at adjacent positions, two randomly placed 1 bp substitutions, 3 bp substitutions both in adjacent and at random positions, four randomly placed base pair substitutions, five randomly placed substitutions, one randomly placed 1 bp insertion, and one randomly placed 1 bp deletion.
Mentions: Stormo and colleagues have shown that PWM representations of TFBSs gives scores that are proportional to the binding energy between the DNA sequence and the protein [30] (see Figure S1A and S1B for a demonstration of two examples where this holds true). This implies that matrix models of TFBSs can be used to estimate the effect on the transcription factor binding affinity of genetic variation in a regulatory region. Our in silico prediction tool identifies polymorphisms for which the assigned score to a TFBS model differs between the two allelic sequences. To define the expected ranges of allelic differences in TFBS scores for various types of mutations, we generated panels of simulated binding sites based on the distribution of bases at each position of the TFBS frequency matrices in the JASPAR database [31], representing wild-type TFBS sequences. We then inserted mutations into the generated sequences to produce a collection of synthetic regulatory sequence variants. Various types of mutations, including 1–5 bp substitutions, 1 bp insertions, and 1 bp deletions were introduced into the generated sequences; see Methods for a detailed description of the mutation classes. For every generated sequence, we computed a TFBS score delta by comparing the TFBS score for the wild-type allele (the sequence generated from the frequency matrix of the TFBS) with the one obtained for the mutated allele. For an explanation of the scoring system and its scale, see [24]. The box plot in Figure 1 shows that the majority of the simple 1 bp substitutions gives score deltas below four, and mutation of additional bases gives higher score deltas. Insertion and deletion polymorphisms behave similarly to each other, with approximately the same median value of the score deltas as for the 1 bp substitutions, but with a higher value of the third quartile and more outliers.

Bottom Line: Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription.The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs.The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers).

View Article: PubMed Central - PubMed

Affiliation: Department of Gene Technology, School of Biotechnology, AlbaNova University Center, Royal Institute of Technology (KTH), Stockholm, Sweden.

ABSTRACT
Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation.

Show MeSH
Related in: MedlinePlus