Limits...
Phylogenetic and functional assessment of orthologs inference projects and methods.

Altenhoff AM, Dessimoz C - PLoS Comput. Biol. (2009)

Bottom Line: We systematically compared their predictions with respect to both phylogeny and function, using six different tests.Second, it introduces new methodology to verify orthology.And third, it sets performance standards for current and future approaches.

View Article: PubMed Central - PubMed

Affiliation: Institute of Computational Science, ETH Zurich, and Swiss Institute of Bioinformatics, Zürich, Switzerland. adrian.altenhoff@inf.ethz.ch

ABSTRACT
Accurate genome-wide identification of orthologs is a central problem in comparative genomics, a fact reflected by the numerous orthology identification projects developed in recent years. However, only a few reports have compared their accuracy, and indeed, several recent efforts have not yet been systematically evaluated. Furthermore, orthology is typically only assessed in terms of function conservation, despite the phylogeny-based original definition of Fitch. We collected and mapped the results of nine leading orthology projects and methods (COG, KOG, Inparanoid, OrthoMCL, Ensembl Compara, Homologene, RoundUp, EggNOG, and OMA) and two standard methods (bidirectional best-hit and reciprocal smallest distance). We systematically compared their predictions with respect to both phylogeny and function, using six different tests. This required the mapping of millions of sequences, the handling of hundreds of millions of predicted pairs of orthologs, and the computation of tens of thousands of trees. In phylogenetic analysis or in functional analysis where high specificity is required, we find that OMA and Homologene perform best. At lower functional specificity but higher coverage level, OrthoMCL outperforms Ensembl Compara, and to a lesser extent Inparanoid. Lastly, the large coverage of the recent EggNOG can be of interest to build broad functional grouping, but the method is not specific enough for phylogenetic or detailed function analyses. In terms of general methodology, we observe that the more sophisticated tree reconstruction/reconciliation approach of Ensembl Compara was at times outperformed by pairwise comparison approaches, even in phylogenetic tests. Furthermore, we show that standard bidirectional best-hit often outperforms projects with more complex algorithms. First, the present study provides guidance for the broad community of orthology data users as to which database best suits their needs. Second, it introduces new methodology to verify orthology. And third, it sets performance standards for current and future approaches.

Show MeSH

Related in: MedlinePlus

Results of functional based tests.Results of functional conservation tests for GO similarity, EC numberexpression correlation and gene neighborhood conservation. In thepairwise project comparisons (left) the relative difference offunctional similarity between OMA and its counter project versus therelative difference of the number of predicted orthologs are shown.In the comparison on the intersection set (right), the meanfunctional similarity versus the number of predicted orthologs onthe common set of sequences are shown. The vertical error bars inall the results state the 95% confidence interval of themeans. The “better arrow” indicates thedirection towards higher specificity and sensitivity. Projects lyingin the gray area are dominated by “OMA Pairwise”in the pairwise comparison (left) and by at least one other projectin the intersection comparison (right).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2612752&req=5

pcbi-1000262-g004: Results of functional based tests.Results of functional conservation tests for GO similarity, EC numberexpression correlation and gene neighborhood conservation. In thepairwise project comparisons (left) the relative difference offunctional similarity between OMA and its counter project versus therelative difference of the number of predicted orthologs are shown.In the comparison on the intersection set (right), the meanfunctional similarity versus the number of predicted orthologs onthe common set of sequences are shown. The vertical error bars inall the results state the 95% confidence interval of themeans. The “better arrow” indicates thedirection towards higher specificity and sensitivity. Projects lyingin the gray area are dominated by “OMA Pairwise”in the pairwise comparison (left) and by at least one other projectin the intersection comparison (right).

Mentions: Figure 4A shows theaverage similarity of GO annotations in pairs of orthologs from thedifferent projects. The mean similarity of all projects falls in arelatively small range, and is quite low. COG/KOG/EggNOG do comparativelymany predictions, but the average similarity score is significantly lower.Hence, the results of COG/KOG/EggNOG are particularly suited forcoarse-grained functional classification. On the other hand, if a highfunctional similarity is desired, the relatively simple BBH approachdominates more sophisticated algorithms such as RoundUp and Homologene(which does fewer predictions at same degree of similarity) or OMA (whichdoes only few more predictions, but significantly lower degree ofsimilarity). This result suggests that sequence similarity is a strongerpredictor of functional relatedness than the evolutionary history of thegenes. At mid specificity level, OrthoMCL outperforms Ensembl Compara andInparanoid, yielding many more predictions at roughly the same similaritylevel.


Phylogenetic and functional assessment of orthologs inference projects and methods.

Altenhoff AM, Dessimoz C - PLoS Comput. Biol. (2009)

Results of functional based tests.Results of functional conservation tests for GO similarity, EC numberexpression correlation and gene neighborhood conservation. In thepairwise project comparisons (left) the relative difference offunctional similarity between OMA and its counter project versus therelative difference of the number of predicted orthologs are shown.In the comparison on the intersection set (right), the meanfunctional similarity versus the number of predicted orthologs onthe common set of sequences are shown. The vertical error bars inall the results state the 95% confidence interval of themeans. The “better arrow” indicates thedirection towards higher specificity and sensitivity. Projects lyingin the gray area are dominated by “OMA Pairwise”in the pairwise comparison (left) and by at least one other projectin the intersection comparison (right).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2612752&req=5

pcbi-1000262-g004: Results of functional based tests.Results of functional conservation tests for GO similarity, EC numberexpression correlation and gene neighborhood conservation. In thepairwise project comparisons (left) the relative difference offunctional similarity between OMA and its counter project versus therelative difference of the number of predicted orthologs are shown.In the comparison on the intersection set (right), the meanfunctional similarity versus the number of predicted orthologs onthe common set of sequences are shown. The vertical error bars inall the results state the 95% confidence interval of themeans. The “better arrow” indicates thedirection towards higher specificity and sensitivity. Projects lyingin the gray area are dominated by “OMA Pairwise”in the pairwise comparison (left) and by at least one other projectin the intersection comparison (right).
Mentions: Figure 4A shows theaverage similarity of GO annotations in pairs of orthologs from thedifferent projects. The mean similarity of all projects falls in arelatively small range, and is quite low. COG/KOG/EggNOG do comparativelymany predictions, but the average similarity score is significantly lower.Hence, the results of COG/KOG/EggNOG are particularly suited forcoarse-grained functional classification. On the other hand, if a highfunctional similarity is desired, the relatively simple BBH approachdominates more sophisticated algorithms such as RoundUp and Homologene(which does fewer predictions at same degree of similarity) or OMA (whichdoes only few more predictions, but significantly lower degree ofsimilarity). This result suggests that sequence similarity is a strongerpredictor of functional relatedness than the evolutionary history of thegenes. At mid specificity level, OrthoMCL outperforms Ensembl Compara andInparanoid, yielding many more predictions at roughly the same similaritylevel.

Bottom Line: We systematically compared their predictions with respect to both phylogeny and function, using six different tests.Second, it introduces new methodology to verify orthology.And third, it sets performance standards for current and future approaches.

View Article: PubMed Central - PubMed

Affiliation: Institute of Computational Science, ETH Zurich, and Swiss Institute of Bioinformatics, Zürich, Switzerland. adrian.altenhoff@inf.ethz.ch

ABSTRACT
Accurate genome-wide identification of orthologs is a central problem in comparative genomics, a fact reflected by the numerous orthology identification projects developed in recent years. However, only a few reports have compared their accuracy, and indeed, several recent efforts have not yet been systematically evaluated. Furthermore, orthology is typically only assessed in terms of function conservation, despite the phylogeny-based original definition of Fitch. We collected and mapped the results of nine leading orthology projects and methods (COG, KOG, Inparanoid, OrthoMCL, Ensembl Compara, Homologene, RoundUp, EggNOG, and OMA) and two standard methods (bidirectional best-hit and reciprocal smallest distance). We systematically compared their predictions with respect to both phylogeny and function, using six different tests. This required the mapping of millions of sequences, the handling of hundreds of millions of predicted pairs of orthologs, and the computation of tens of thousands of trees. In phylogenetic analysis or in functional analysis where high specificity is required, we find that OMA and Homologene perform best. At lower functional specificity but higher coverage level, OrthoMCL outperforms Ensembl Compara, and to a lesser extent Inparanoid. Lastly, the large coverage of the recent EggNOG can be of interest to build broad functional grouping, but the method is not specific enough for phylogenetic or detailed function analyses. In terms of general methodology, we observe that the more sophisticated tree reconstruction/reconciliation approach of Ensembl Compara was at times outperformed by pairwise comparison approaches, even in phylogenetic tests. Furthermore, we show that standard bidirectional best-hit often outperforms projects with more complex algorithms. First, the present study provides guidance for the broad community of orthology data users as to which database best suits their needs. Second, it introduces new methodology to verify orthology. And third, it sets performance standards for current and future approaches.

Show MeSH
Related in: MedlinePlus