Limits...
Rock, paper, scissors: harnessing complementarity in ortholog detection methods improves comparative genomic inference.

Maher MC, Hernandez RD - G3 (Bethesda) (2015)

Bottom Line: OD methods comprise a wide variety of approaches, each with their own benefits and costs under a variety of evolutionary and practical scenarios.In head-to-head comparisons, we find that these algorithms significantly outperform one another for 38-45% of the genes analyzed.Further, this improvement in alignment quality yields more confidently aligned sites and higher levels of overall conservation, while simultaneously detecting of up to 180% more positively selected sites.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology and Biostatistics, University of California, San Francisco, University of California, San Francisco, San Francisco, California.

Show MeSH

Related in: MedlinePlus

OD power and the effect of pooling methods (A) The cumulative proportion of human transcripts for which an ortholog was detected, stratified by species. Envelopes illustrate results from pooling an increasing number of methods. (B) The cumulative number of human transcripts as a function of the maximum number of missing species allowed.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4390578&req=5

fig4: OD power and the effect of pooling methods (A) The cumulative proportion of human transcripts for which an ortholog was detected, stratified by species. Envelopes illustrate results from pooling an increasing number of methods. (B) The cumulative number of human transcripts as a function of the maximum number of missing species allowed.

Mentions: All ortholog detection methods produce false positives. For example, this can result when a gene deletion on one lineage means that no true ortholog exists in a given species. Typically, these issues are dealt with through rigorous filtering of input alignments. The intuition is that by applying a stringent sequence similarity filter, we can remove the vast majority of evolutionarily unrelated genes. We use this filtering approach to ensure that only credible, putatively orthologous sequences are included in the analysis. Because of heterogeneity in genome quality, similarity cutoffs were chosen heuristically, considering the known level of genome-wide divergence between human and the species of interest, as well as the overall distributions of percent identity between putative orthologs in the two species. Specifically, we first chose a cutoff based on the species-specific levels of percent identity to human. We then updated these numbers based on spot checks of borderline alignment cases. These cutoffs were as follows: chimp: 82%, gorilla: 77%, orangutan: 75%, rhesus macaque: 73%. A cutoff of 70% was employed for marmoset, bushbaby, cat, cow, and horse. For applications where consistency across methods is not important, these cutoffs could be chosen using downstream quality metrics such as those presented in Figure 4. Note that such an approach would still require the user to specify a tradeoff between the quality and number of orthologs.


Rock, paper, scissors: harnessing complementarity in ortholog detection methods improves comparative genomic inference.

Maher MC, Hernandez RD - G3 (Bethesda) (2015)

OD power and the effect of pooling methods (A) The cumulative proportion of human transcripts for which an ortholog was detected, stratified by species. Envelopes illustrate results from pooling an increasing number of methods. (B) The cumulative number of human transcripts as a function of the maximum number of missing species allowed.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4390578&req=5

fig4: OD power and the effect of pooling methods (A) The cumulative proportion of human transcripts for which an ortholog was detected, stratified by species. Envelopes illustrate results from pooling an increasing number of methods. (B) The cumulative number of human transcripts as a function of the maximum number of missing species allowed.
Mentions: All ortholog detection methods produce false positives. For example, this can result when a gene deletion on one lineage means that no true ortholog exists in a given species. Typically, these issues are dealt with through rigorous filtering of input alignments. The intuition is that by applying a stringent sequence similarity filter, we can remove the vast majority of evolutionarily unrelated genes. We use this filtering approach to ensure that only credible, putatively orthologous sequences are included in the analysis. Because of heterogeneity in genome quality, similarity cutoffs were chosen heuristically, considering the known level of genome-wide divergence between human and the species of interest, as well as the overall distributions of percent identity between putative orthologs in the two species. Specifically, we first chose a cutoff based on the species-specific levels of percent identity to human. We then updated these numbers based on spot checks of borderline alignment cases. These cutoffs were as follows: chimp: 82%, gorilla: 77%, orangutan: 75%, rhesus macaque: 73%. A cutoff of 70% was employed for marmoset, bushbaby, cat, cow, and horse. For applications where consistency across methods is not important, these cutoffs could be chosen using downstream quality metrics such as those presented in Figure 4. Note that such an approach would still require the user to specify a tradeoff between the quality and number of orthologs.

Bottom Line: OD methods comprise a wide variety of approaches, each with their own benefits and costs under a variety of evolutionary and practical scenarios.In head-to-head comparisons, we find that these algorithms significantly outperform one another for 38-45% of the genes analyzed.Further, this improvement in alignment quality yields more confidently aligned sites and higher levels of overall conservation, while simultaneously detecting of up to 180% more positively selected sites.

View Article: PubMed Central - PubMed

Affiliation: Department of Epidemiology and Biostatistics, University of California, San Francisco, University of California, San Francisco, San Francisco, California.

Show MeSH
Related in: MedlinePlus