Limits...
Estimates of the effect of natural selection on protein-coding content.

Yap VB, Lindsay H, Easteal S, Huttley G - Mol. Biol. Evol. (2009)

Bottom Line: For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (omega) distinguishes neutrally evolving sequences (omega = 1) from those subjected to purifying (omega < 1) or positive Darwinian (omega > 1) selection.We show that current models used to estimate omega are substantially biased by naturally occurring sequence compositions.Our work has substantial implications for how vaccine targets are chosen and for studying the molecular basis of adaptive evolution.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore. stayapvb@nus.edu.sg

ABSTRACT
Analysis of natural selection is key to understanding many core biological processes, including the emergence of competition, cooperation, and complexity, and has important applications in the targeted development of vaccines. Selection is hard to observe directly but can be inferred from molecular sequence variation. For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (omega) distinguishes neutrally evolving sequences (omega = 1) from those subjected to purifying (omega < 1) or positive Darwinian (omega > 1) selection. We show that current models used to estimate omega are substantially biased by naturally occurring sequence compositions. We present a novel model that weights substitutions by conditional nucleotide frequencies and which escapes these artifacts. Applying it to the genomes of pathogens causing malaria, leprosy, tuberculosis, and Lyme disease gave significant discrepancies in estimates with approximately 10-30% of genes affected. Our work has substantial implications for how vaccine targets are chosen and for studying the molecular basis of adaptive evolution.

Show MeSH

Related in: MedlinePlus

The effect of nucleotide composition and nonmultiplicative codon frequencies on estimates of ω from simulated neutrally evolving genes. Sequence simulations were based on an AT-rich gene sampled from Borrelia species, a primate gene with AT% ≈ GC%, and a GC-rich gene sampled from Mycobacterium species. Average GC% of the simulated alignments is shown. The x axis is , and the y axis is an estimate of density. (A) Data generated from a NFGTR(ω = 1) model resulting in multiplicative codon frequencies. (B) Data generated from a CNFGTR(ω = 1) model with observed (nonmultiplicative) codon frequencies from the sampled genes. The dashed vertical line shows the expected neutral value, ω = 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2822286&req=5

fig1: The effect of nucleotide composition and nonmultiplicative codon frequencies on estimates of ω from simulated neutrally evolving genes. Sequence simulations were based on an AT-rich gene sampled from Borrelia species, a primate gene with AT% ≈ GC%, and a GC-rich gene sampled from Mycobacterium species. Average GC% of the simulated alignments is shown. The x axis is , and the y axis is an estimate of density. (A) Data generated from a NFGTR(ω = 1) model resulting in multiplicative codon frequencies. (B) Data generated from a CNFGTR(ω = 1) model with observed (nonmultiplicative) codon frequencies from the sampled genes. The dashed vertical line shows the expected neutral value, ω = 1.

Mentions: Simulations of neutrally evolving genes confirmed the predicted sensitivity of CF and NF to sequence composition. Simulations were carried out with multiplicative and nonmultiplicative codon frequencies using parameters estimated by fitting GTR variants of the NF and CNF models, respectively, to real sequences with GC% ranging from 30% to 65% (see Theory and Methods). For multiplicative codon frequencies (simulated under NFGTR(ω = 1)), from the NF and CNF models were similar (fig. 1A) with the largest difference evident for the AT-rich sequences, consistent with the expected bias affecting NF models due to the AT-richness of stop codons (Theory and Methods). As predicted (Lindsay et al. 2008), obtained under the CF model were strongly affected by composition, moving from < 1 to > 1 as sequence composition changed from AT rich to GC rich (fig. 1A). For nonmultiplicative codon frequencies (simulated under CNFGTR(ω = 1)), both NF and CF models substantially over- or underestimated ω with the direction of departure depending on composition and codon usage (fig. 1B). These results imply that estimates of ω obtained under both CF and NF models do not provide reliable evidence of the mode of natural selection.


Estimates of the effect of natural selection on protein-coding content.

Yap VB, Lindsay H, Easteal S, Huttley G - Mol. Biol. Evol. (2009)

The effect of nucleotide composition and nonmultiplicative codon frequencies on estimates of ω from simulated neutrally evolving genes. Sequence simulations were based on an AT-rich gene sampled from Borrelia species, a primate gene with AT% ≈ GC%, and a GC-rich gene sampled from Mycobacterium species. Average GC% of the simulated alignments is shown. The x axis is , and the y axis is an estimate of density. (A) Data generated from a NFGTR(ω = 1) model resulting in multiplicative codon frequencies. (B) Data generated from a CNFGTR(ω = 1) model with observed (nonmultiplicative) codon frequencies from the sampled genes. The dashed vertical line shows the expected neutral value, ω = 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2822286&req=5

fig1: The effect of nucleotide composition and nonmultiplicative codon frequencies on estimates of ω from simulated neutrally evolving genes. Sequence simulations were based on an AT-rich gene sampled from Borrelia species, a primate gene with AT% ≈ GC%, and a GC-rich gene sampled from Mycobacterium species. Average GC% of the simulated alignments is shown. The x axis is , and the y axis is an estimate of density. (A) Data generated from a NFGTR(ω = 1) model resulting in multiplicative codon frequencies. (B) Data generated from a CNFGTR(ω = 1) model with observed (nonmultiplicative) codon frequencies from the sampled genes. The dashed vertical line shows the expected neutral value, ω = 1.
Mentions: Simulations of neutrally evolving genes confirmed the predicted sensitivity of CF and NF to sequence composition. Simulations were carried out with multiplicative and nonmultiplicative codon frequencies using parameters estimated by fitting GTR variants of the NF and CNF models, respectively, to real sequences with GC% ranging from 30% to 65% (see Theory and Methods). For multiplicative codon frequencies (simulated under NFGTR(ω = 1)), from the NF and CNF models were similar (fig. 1A) with the largest difference evident for the AT-rich sequences, consistent with the expected bias affecting NF models due to the AT-richness of stop codons (Theory and Methods). As predicted (Lindsay et al. 2008), obtained under the CF model were strongly affected by composition, moving from < 1 to > 1 as sequence composition changed from AT rich to GC rich (fig. 1A). For nonmultiplicative codon frequencies (simulated under CNFGTR(ω = 1)), both NF and CF models substantially over- or underestimated ω with the direction of departure depending on composition and codon usage (fig. 1B). These results imply that estimates of ω obtained under both CF and NF models do not provide reliable evidence of the mode of natural selection.

Bottom Line: For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (omega) distinguishes neutrally evolving sequences (omega = 1) from those subjected to purifying (omega < 1) or positive Darwinian (omega > 1) selection.We show that current models used to estimate omega are substantially biased by naturally occurring sequence compositions.Our work has substantial implications for how vaccine targets are chosen and for studying the molecular basis of adaptive evolution.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore. stayapvb@nus.edu.sg

ABSTRACT
Analysis of natural selection is key to understanding many core biological processes, including the emergence of competition, cooperation, and complexity, and has important applications in the targeted development of vaccines. Selection is hard to observe directly but can be inferred from molecular sequence variation. For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (omega) distinguishes neutrally evolving sequences (omega = 1) from those subjected to purifying (omega < 1) or positive Darwinian (omega > 1) selection. We show that current models used to estimate omega are substantially biased by naturally occurring sequence compositions. We present a novel model that weights substitutions by conditional nucleotide frequencies and which escapes these artifacts. Applying it to the genomes of pathogens causing malaria, leprosy, tuberculosis, and Lyme disease gave significant discrepancies in estimates with approximately 10-30% of genes affected. Our work has substantial implications for how vaccine targets are chosen and for studying the molecular basis of adaptive evolution.

Show MeSH
Related in: MedlinePlus