Limits...
Estimates of the effect of natural selection on protein-coding content.

Yap VB, Lindsay H, Easteal S, Huttley G - Mol. Biol. Evol. (2009)

Bottom Line: For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (omega) distinguishes neutrally evolving sequences (omega = 1) from those subjected to purifying (omega < 1) or positive Darwinian (omega > 1) selection.We show that current models used to estimate omega are substantially biased by naturally occurring sequence compositions.Our work has substantial implications for how vaccine targets are chosen and for studying the molecular basis of adaptive evolution.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore. stayapvb@nus.edu.sg

ABSTRACT
Analysis of natural selection is key to understanding many core biological processes, including the emergence of competition, cooperation, and complexity, and has important applications in the targeted development of vaccines. Selection is hard to observe directly but can be inferred from molecular sequence variation. For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (omega) distinguishes neutrally evolving sequences (omega = 1) from those subjected to purifying (omega < 1) or positive Darwinian (omega > 1) selection. We show that current models used to estimate omega are substantially biased by naturally occurring sequence compositions. We present a novel model that weights substitutions by conditional nucleotide frequencies and which escapes these artifacts. Applying it to the genomes of pathogens causing malaria, leprosy, tuberculosis, and Lyme disease gave significant discrepancies in estimates with approximately 10-30% of genes affected. Our work has substantial implications for how vaccine targets are chosen and for studying the molecular basis of adaptive evolution.

Show MeSH

Related in: MedlinePlus

Evidence the CF model is prone to underestimating positive natural selection in Plasmodium. Plotted are  from the models indicated by subscript on the x and y axes. The gray region corresponds to the realm of ω values representing neutral or purifying natural selection. Values of ω outside this zone indicate positive natural selection. The left and right plot columns are from Plasmodium control and ligand loci, respectively. Dashed diagonal lines correspond to a slope of 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2822286&req=5

fig3: Evidence the CF model is prone to underestimating positive natural selection in Plasmodium. Plotted are from the models indicated by subscript on the x and y axes. The gray region corresponds to the realm of ω values representing neutral or purifying natural selection. Values of ω outside this zone indicate positive natural selection. The left and right plot columns are from Plasmodium control and ligand loci, respectively. Dashed diagonal lines correspond to a slope of 1.

Mentions: The high error rate for CF applied to Plasmodium (table 3) arises from systematic underestimation of ω, an effect that can cause strong candidates for adaptive evolution to be misclassified. The molecular arms race underway between Plasmodium parasites and their hosts predicts that Plasmodium genes that mediate interactions with the host should exhibit evidence for adaptation. Our results (fig. 1A and table 3) suggested that the low GC% of some Plasmodium genomes, however, will cause CF to systematically underestimate ω, potentially providing false-negative evidence of the involvement of genes in host–parasite interactions. We confirmed this potential in an analysis of Plasmodium genes classified by experimental evidence as ligands or not ligands and thus likely or unlikely candidates for adaptive evolution, respectively (Weedall et al. 2008). Using orthologous gene pairs from Plasmodium species with AT-rich genomes, CFHKY (ω estimated from CFHKY) was systematically underestimated for both the control and the adaptive candidate genes (points were typically scattered below the diagonal, fig. 3). In contrast, for a small number of candidate genes, CNFGTR lay within the zone indicative of adaptive evolution, supporting an adaptive role for these genes. Although from NF and CNF were largely indistinguishable (table 3), a general trend toward overestimation by NF was evident (an excess of points were scattered above the diagonal, fig. 3).


Estimates of the effect of natural selection on protein-coding content.

Yap VB, Lindsay H, Easteal S, Huttley G - Mol. Biol. Evol. (2009)

Evidence the CF model is prone to underestimating positive natural selection in Plasmodium. Plotted are  from the models indicated by subscript on the x and y axes. The gray region corresponds to the realm of ω values representing neutral or purifying natural selection. Values of ω outside this zone indicate positive natural selection. The left and right plot columns are from Plasmodium control and ligand loci, respectively. Dashed diagonal lines correspond to a slope of 1.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2822286&req=5

fig3: Evidence the CF model is prone to underestimating positive natural selection in Plasmodium. Plotted are from the models indicated by subscript on the x and y axes. The gray region corresponds to the realm of ω values representing neutral or purifying natural selection. Values of ω outside this zone indicate positive natural selection. The left and right plot columns are from Plasmodium control and ligand loci, respectively. Dashed diagonal lines correspond to a slope of 1.
Mentions: The high error rate for CF applied to Plasmodium (table 3) arises from systematic underestimation of ω, an effect that can cause strong candidates for adaptive evolution to be misclassified. The molecular arms race underway between Plasmodium parasites and their hosts predicts that Plasmodium genes that mediate interactions with the host should exhibit evidence for adaptation. Our results (fig. 1A and table 3) suggested that the low GC% of some Plasmodium genomes, however, will cause CF to systematically underestimate ω, potentially providing false-negative evidence of the involvement of genes in host–parasite interactions. We confirmed this potential in an analysis of Plasmodium genes classified by experimental evidence as ligands or not ligands and thus likely or unlikely candidates for adaptive evolution, respectively (Weedall et al. 2008). Using orthologous gene pairs from Plasmodium species with AT-rich genomes, CFHKY (ω estimated from CFHKY) was systematically underestimated for both the control and the adaptive candidate genes (points were typically scattered below the diagonal, fig. 3). In contrast, for a small number of candidate genes, CNFGTR lay within the zone indicative of adaptive evolution, supporting an adaptive role for these genes. Although from NF and CNF were largely indistinguishable (table 3), a general trend toward overestimation by NF was evident (an excess of points were scattered above the diagonal, fig. 3).

Bottom Line: For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (omega) distinguishes neutrally evolving sequences (omega = 1) from those subjected to purifying (omega < 1) or positive Darwinian (omega > 1) selection.We show that current models used to estimate omega are substantially biased by naturally occurring sequence compositions.Our work has substantial implications for how vaccine targets are chosen and for studying the molecular basis of adaptive evolution.

View Article: PubMed Central - PubMed

Affiliation: Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore. stayapvb@nus.edu.sg

ABSTRACT
Analysis of natural selection is key to understanding many core biological processes, including the emergence of competition, cooperation, and complexity, and has important applications in the targeted development of vaccines. Selection is hard to observe directly but can be inferred from molecular sequence variation. For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (omega) distinguishes neutrally evolving sequences (omega = 1) from those subjected to purifying (omega < 1) or positive Darwinian (omega > 1) selection. We show that current models used to estimate omega are substantially biased by naturally occurring sequence compositions. We present a novel model that weights substitutions by conditional nucleotide frequencies and which escapes these artifacts. Applying it to the genomes of pathogens causing malaria, leprosy, tuberculosis, and Lyme disease gave significant discrepancies in estimates with approximately 10-30% of genes affected. Our work has substantial implications for how vaccine targets are chosen and for studying the molecular basis of adaptive evolution.

Show MeSH
Related in: MedlinePlus