Limits...
FRESCo: finding regions of excess synonymous constraint in diverse viruses.

Sealfon RS, Lin MF, Jungreis I, Wolf MY, Kellis M, Sabeti PC - Genome Biol. (2015)

Bottom Line: Synonymous substitutions in these regions would be selectively disfavored and thus these regions are characterized by excess synonymous constraint.Codon choice can also modulate transcriptional efficiency, translational accuracy, and protein folding.We developed a phylogenetic codon model-based framework, FRESCo, designed to find regions of excess synonymous constraint in short, deep alignments, such as individual viral genes across many sequenced isolates.

View Article: PubMed Central - PubMed

ABSTRACT

Background: The increasing availability of sequence data for many viruses provides power to detect regions under unusual evolutionary constraint at a high resolution. One approach leverages the synonymous substitution rate as a signature to pinpoint genic regions encoding overlapping or embedded functional elements. Protein-coding regions in viral genomes often contain overlapping RNA structural elements, reading frames, regulatory elements, microRNAs, and packaging signals. Synonymous substitutions in these regions would be selectively disfavored and thus these regions are characterized by excess synonymous constraint. Codon choice can also modulate transcriptional efficiency, translational accuracy, and protein folding.

Results: We developed a phylogenetic codon model-based framework, FRESCo, designed to find regions of excess synonymous constraint in short, deep alignments, such as individual viral genes across many sequenced isolates. We demonstrated the high specificity of our approach on simulated data and applied our framework to the protein-coding regions of approximately 30 distinct species of viruses with diverse genome architectures.

Conclusions: FRESCo recovers known multifunctional regions in well-characterized viruses such as hepatitis B virus, poliovirus, and West Nile virus, often at a single-codon resolution, and predicts many novel functional elements overlapping viral genes, including in Lassa and Ebola viruses. In a number of viruses, the synonymously constrained regions that we identified also display conserved, stable predicted RNA structures, including putative novel elements in multiple viral species.

Show MeSH

Related in: MedlinePlus

FRESCo demonstrates high specificity in tests on simulated regions of excess synonymous constraint. (A) On a simulated dataset of 1,000 sequences with regions of varying strength of synonymous constraint, FRESCo recovers SCEs with high accuracy. We plot the synonymous substitution rate at a 10-codon resolution, displaying below the plot the relative synonymous substitution rate in each portion of the sequence. The red tracks at the bottom show recovered regions of significant excess synonymous constraint at window sizes of 1, 5, 10, 20, and 50 codons. (B) Recovery of simulated regions of excess synonymous constraint improves with increasing branch length (in substitutions/site), strength of synonymous constraint, and number of aligned sequences (5-codon sliding windows). (C) Distribution of P-values in simulated sequence where there is no synonymous constraint. Q-Q plots of the distribution of P-values for 5-codon sliding windows in simulations based on alignments of 100 (top), 500 (middle), and 1,000 (bottom) random sequences. Each plot is based on 20 independent, 500-codon simulated alignments (total of 10,000 codons).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
getmorefigures.php?uid=PMC4376164&req=5

Fig2: FRESCo demonstrates high specificity in tests on simulated regions of excess synonymous constraint. (A) On a simulated dataset of 1,000 sequences with regions of varying strength of synonymous constraint, FRESCo recovers SCEs with high accuracy. We plot the synonymous substitution rate at a 10-codon resolution, displaying below the plot the relative synonymous substitution rate in each portion of the sequence. The red tracks at the bottom show recovered regions of significant excess synonymous constraint at window sizes of 1, 5, 10, 20, and 50 codons. (B) Recovery of simulated regions of excess synonymous constraint improves with increasing branch length (in substitutions/site), strength of synonymous constraint, and number of aligned sequences (5-codon sliding windows). (C) Distribution of P-values in simulated sequence where there is no synonymous constraint. Q-Q plots of the distribution of P-values for 5-codon sliding windows in simulations based on alignments of 100 (top), 500 (middle), and 1,000 (bottom) random sequences. Each plot is based on 20 independent, 500-codon simulated alignments (total of 10,000 codons).

Mentions: In this simulated alignment, FRESCo accurately recovers both the long, weak SCE and the short, strong SCE (FigureĀ 2A). As expected, the short SCE is well captured by smaller sliding windows (and in fact is recovered quite accurately at a single-codon resolution), while the long region of weaker constraint is best recovered at larger window sizes. Outside the regions of synonymous constraint, the estimated synonymous substitution rate is >1, giving an overall genome-wide average synonymous substitution rate normalized to 1.Figure 2


FRESCo: finding regions of excess synonymous constraint in diverse viruses.

Sealfon RS, Lin MF, Jungreis I, Wolf MY, Kellis M, Sabeti PC - Genome Biol. (2015)

FRESCo demonstrates high specificity in tests on simulated regions of excess synonymous constraint. (A) On a simulated dataset of 1,000 sequences with regions of varying strength of synonymous constraint, FRESCo recovers SCEs with high accuracy. We plot the synonymous substitution rate at a 10-codon resolution, displaying below the plot the relative synonymous substitution rate in each portion of the sequence. The red tracks at the bottom show recovered regions of significant excess synonymous constraint at window sizes of 1, 5, 10, 20, and 50 codons. (B) Recovery of simulated regions of excess synonymous constraint improves with increasing branch length (in substitutions/site), strength of synonymous constraint, and number of aligned sequences (5-codon sliding windows). (C) Distribution of P-values in simulated sequence where there is no synonymous constraint. Q-Q plots of the distribution of P-values for 5-codon sliding windows in simulations based on alignments of 100 (top), 500 (middle), and 1,000 (bottom) random sequences. Each plot is based on 20 independent, 500-codon simulated alignments (total of 10,000 codons).
© Copyright Policy - open-access
Related In: Results  -  Collection

License 1 - License 2
Show All Figures
getmorefigures.php?uid=PMC4376164&req=5

Fig2: FRESCo demonstrates high specificity in tests on simulated regions of excess synonymous constraint. (A) On a simulated dataset of 1,000 sequences with regions of varying strength of synonymous constraint, FRESCo recovers SCEs with high accuracy. We plot the synonymous substitution rate at a 10-codon resolution, displaying below the plot the relative synonymous substitution rate in each portion of the sequence. The red tracks at the bottom show recovered regions of significant excess synonymous constraint at window sizes of 1, 5, 10, 20, and 50 codons. (B) Recovery of simulated regions of excess synonymous constraint improves with increasing branch length (in substitutions/site), strength of synonymous constraint, and number of aligned sequences (5-codon sliding windows). (C) Distribution of P-values in simulated sequence where there is no synonymous constraint. Q-Q plots of the distribution of P-values for 5-codon sliding windows in simulations based on alignments of 100 (top), 500 (middle), and 1,000 (bottom) random sequences. Each plot is based on 20 independent, 500-codon simulated alignments (total of 10,000 codons).
Mentions: In this simulated alignment, FRESCo accurately recovers both the long, weak SCE and the short, strong SCE (FigureĀ 2A). As expected, the short SCE is well captured by smaller sliding windows (and in fact is recovered quite accurately at a single-codon resolution), while the long region of weaker constraint is best recovered at larger window sizes. Outside the regions of synonymous constraint, the estimated synonymous substitution rate is >1, giving an overall genome-wide average synonymous substitution rate normalized to 1.Figure 2

Bottom Line: Synonymous substitutions in these regions would be selectively disfavored and thus these regions are characterized by excess synonymous constraint.Codon choice can also modulate transcriptional efficiency, translational accuracy, and protein folding.We developed a phylogenetic codon model-based framework, FRESCo, designed to find regions of excess synonymous constraint in short, deep alignments, such as individual viral genes across many sequenced isolates.

View Article: PubMed Central - PubMed

ABSTRACT

Background: The increasing availability of sequence data for many viruses provides power to detect regions under unusual evolutionary constraint at a high resolution. One approach leverages the synonymous substitution rate as a signature to pinpoint genic regions encoding overlapping or embedded functional elements. Protein-coding regions in viral genomes often contain overlapping RNA structural elements, reading frames, regulatory elements, microRNAs, and packaging signals. Synonymous substitutions in these regions would be selectively disfavored and thus these regions are characterized by excess synonymous constraint. Codon choice can also modulate transcriptional efficiency, translational accuracy, and protein folding.

Results: We developed a phylogenetic codon model-based framework, FRESCo, designed to find regions of excess synonymous constraint in short, deep alignments, such as individual viral genes across many sequenced isolates. We demonstrated the high specificity of our approach on simulated data and applied our framework to the protein-coding regions of approximately 30 distinct species of viruses with diverse genome architectures.

Conclusions: FRESCo recovers known multifunctional regions in well-characterized viruses such as hepatitis B virus, poliovirus, and West Nile virus, often at a single-codon resolution, and predicts many novel functional elements overlapping viral genes, including in Lassa and Ebola viruses. In a number of viruses, the synonymously constrained regions that we identified also display conserved, stable predicted RNA structures, including putative novel elements in multiple viral species.

Show MeSH
Related in: MedlinePlus