Limits...
Estimates of the effect of natural selection on protein-coding content.

Yap VB, Lindsay H, Easteal S, Huttley G - Mol. Biol. Evol. (2009)

Bottom Line: For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (omega) distinguishes neutrally evolving sequences (omega = 1) from those subjected to purifying (omega < 1) or positive Darwinian (omega > 1) selection.We show that current models used to estimate omega are substantially biased by naturally occurring sequence compositions.Our work has substantial implications for how vaccine targets are chosen and for studying the molecular basis of adaptive evolution.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore. stayapvb@nus.edu.sg

ABSTRACT
Analysis of natural selection is key to understanding many core biological processes, including the emergence of competition, cooperation, and complexity, and has important applications in the targeted development of vaccines. Selection is hard to observe directly but can be inferred from molecular sequence variation. For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (omega) distinguishes neutrally evolving sequences (omega = 1) from those subjected to purifying (omega < 1) or positive Darwinian (omega > 1) selection. We show that current models used to estimate omega are substantially biased by naturally occurring sequence compositions. We present a novel model that weights substitutions by conditional nucleotide frequencies and which escapes these artifacts. Applying it to the genomes of pathogens causing malaria, leprosy, tuberculosis, and Lyme disease gave significant discrepancies in estimates with approximately 10-30% of genes affected. Our work has substantial implications for how vaccine targets are chosen and for studying the molecular basis of adaptive evolution.

Show MeSH

Related in: MedlinePlus

Incorrect Type 1 error rates for CF and NF in testing the  hypothesis of one class of sites against the alternate of two site classes. The sequences were the same as those from figure 1B with GC% ≈ 50—simulated under CNFGTR(ω = 1). The dashed diagonal line is the expected quantile relationship for χ22.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2822286&req=5

fig4: Incorrect Type 1 error rates for CF and NF in testing the hypothesis of one class of sites against the alternate of two site classes. The sequences were the same as those from figure 1B with GC% ≈ 50—simulated under CNFGTR(ω = 1). The dashed diagonal line is the expected quantile relationship for χ22.

Mentions: We tested this prediction for the case of an alternate hypothesis of among-site heterogeneity of ω, using a simple form of mixture model that specifies two site classes with neutral positions evolving according to 0 ≤ ω ≤ 1 and adaptive positions evolving according to ω > 1. Using the sequences simulated under a single site class CNF model with ∼50% GC (fig. 1B), we found, as predicted, that the CF form was conspicuously prone to false positives, whereas the NF model was weakly conservative (fig. 4).


Estimates of the effect of natural selection on protein-coding content.

Yap VB, Lindsay H, Easteal S, Huttley G - Mol. Biol. Evol. (2009)

Incorrect Type 1 error rates for CF and NF in testing the  hypothesis of one class of sites against the alternate of two site classes. The sequences were the same as those from figure 1B with GC% ≈ 50—simulated under CNFGTR(ω = 1). The dashed diagonal line is the expected quantile relationship for χ22.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2822286&req=5

fig4: Incorrect Type 1 error rates for CF and NF in testing the hypothesis of one class of sites against the alternate of two site classes. The sequences were the same as those from figure 1B with GC% ≈ 50—simulated under CNFGTR(ω = 1). The dashed diagonal line is the expected quantile relationship for χ22.
Mentions: We tested this prediction for the case of an alternate hypothesis of among-site heterogeneity of ω, using a simple form of mixture model that specifies two site classes with neutral positions evolving according to 0 ≤ ω ≤ 1 and adaptive positions evolving according to ω > 1. Using the sequences simulated under a single site class CNF model with ∼50% GC (fig. 1B), we found, as predicted, that the CF form was conspicuously prone to false positives, whereas the NF model was weakly conservative (fig. 4).

Bottom Line: For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (omega) distinguishes neutrally evolving sequences (omega = 1) from those subjected to purifying (omega < 1) or positive Darwinian (omega > 1) selection.We show that current models used to estimate omega are substantially biased by naturally occurring sequence compositions.Our work has substantial implications for how vaccine targets are chosen and for studying the molecular basis of adaptive evolution.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore. stayapvb@nus.edu.sg

ABSTRACT
Analysis of natural selection is key to understanding many core biological processes, including the emergence of competition, cooperation, and complexity, and has important applications in the targeted development of vaccines. Selection is hard to observe directly but can be inferred from molecular sequence variation. For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (omega) distinguishes neutrally evolving sequences (omega = 1) from those subjected to purifying (omega < 1) or positive Darwinian (omega > 1) selection. We show that current models used to estimate omega are substantially biased by naturally occurring sequence compositions. We present a novel model that weights substitutions by conditional nucleotide frequencies and which escapes these artifacts. Applying it to the genomes of pathogens causing malaria, leprosy, tuberculosis, and Lyme disease gave significant discrepancies in estimates with approximately 10-30% of genes affected. Our work has substantial implications for how vaccine targets are chosen and for studying the molecular basis of adaptive evolution.

Show MeSH
Related in: MedlinePlus