Rock, paper, scissors: harnessing complementarity in ortholog detection methods improves comparative genomic inference.
Bottom Line: This task involves accurately identifying genes across species that descend from a common ancestral sequence.In head-to-head comparisons, we find that these algorithms significantly outperform one another for 38-45% of the genes analyzed.Further, this improvement in alignment quality yields more confidently aligned sites and higher levels of overall conservation, while simultaneously detecting of up to 180% more positively selected sites.
Affiliation: Department of Epidemiology and Biostatistics, University of California, San Francisco, University of California, San Francisco, San Francisco, California.Show MeSH
Related in: MedlinePlus
Mentions: All ortholog detection methods produce false positives. For example, this can result when a gene deletion on one lineage means that no true ortholog exists in a given species. Typically, these issues are dealt with through rigorous filtering of input alignments. The intuition is that by applying a stringent sequence similarity filter, we can remove the vast majority of evolutionarily unrelated genes. We use this filtering approach to ensure that only credible, putatively orthologous sequences are included in the analysis. Because of heterogeneity in genome quality, similarity cutoffs were chosen heuristically, considering the known level of genome-wide divergence between human and the species of interest, as well as the overall distributions of percent identity between putative orthologs in the two species. Specifically, we first chose a cutoff based on the species-specific levels of percent identity to human. We then updated these numbers based on spot checks of borderline alignment cases. These cutoffs were as follows: chimp: 82%, gorilla: 77%, orangutan: 75%, rhesus macaque: 73%. A cutoff of 70% was employed for marmoset, bushbaby, cat, cow, and horse. For applications where consistency across methods is not important, these cutoffs could be chosen using downstream quality metrics such as those presented in Figure 4. Note that such an approach would still require the user to specify a tradeoff between the quality and number of orthologs.
Affiliation: Department of Epidemiology and Biostatistics, University of California, San Francisco, University of California, San Francisco, San Francisco, California.