Limits...
A phylogeny-based benchmarking test for orthology inference reveals the limitations of function-based validation.

Trachana K, Forslund K, Larsson T, Powell S, Doerks T, von Mering C, Bork P - PLoS ONE (2014)

Bottom Line: Herein, we used this dataset to demonstrate 1) why a manually curated, phylogeny-based dataset is more appropriate for benchmarking orthology than other popular practices and 2) how it guides database design and parameterization through careful error quantification.We also examined how our dataset can instruct the selection of a "core" species repertoire to improve detection accuracy.We conclude that including more genomes at the proper evolutionary distances can influence the overall quality of orthology detection.

View Article: PubMed Central - PubMed

Affiliation: Institute for Systems Biology, Seattle, WA, United States of America.

ABSTRACT
Accurate orthology prediction is crucial for many applications in the post-genomic era. The lack of broadly accepted benchmark tests precludes a comprehensive analysis of orthology inference. So far, functional annotation between orthologs serves as a performance proxy. However, this violates the fundamental principle of orthology as an evolutionary definition, while it is often not applicable due to limited experimental evidence for most species. Therefore, we constructed high quality "gold standard" orthologous groups that can serve as a benchmark set for orthology inference in bacterial species. Herein, we used this dataset to demonstrate 1) why a manually curated, phylogeny-based dataset is more appropriate for benchmarking orthology than other popular practices and 2) how it guides database design and parameterization through careful error quantification. More specifically, we illustrate how function-based tests often fail to identify false assignments, misjudging the true performance of orthology inference methods. We also examined how our dataset can instruct the selection of a "core" species repertoire to improve detection accuracy. We conclude that including more genomes at the proper evolutionary distances can influence the overall quality of orthology detection. The curated gene families, called Reference Orthologous Groups, are publicly available at http://eggnog.embl.de/orthobench2.

Show MeSH
Function-based benchmark tests do not separate between false- and true- assignments.Consensus functional annotations were determined based on (i) gene order (number of neighbor gene families that are conserved across RefOG members), (ii) protein domain content (number of protein domains that are conserved across RefOG members) and (iii) enzymatic activity (number of EC digits that are conserved across RefOG members) for every RefOG (Material and Methods). A) The distribution of conserved features across true-, missing- and false assignments for each family are illustrated with boxplots. The upper and lower boxplot panels exemplify families where the functional feature does or does not discriminate, respectively, false and true assignment. B) Bar plots show the number of orthologous groups that would be classified as “accurately inferred” using function-based tests. C) Density plots illustrate the probability to discriminate the true-, missing- and false-assignments for every function-based test (density of mean number of the conserved features for every assignment category/RefOG). The data for all 49 RefOGs is shown in the Figures S1-S3 in File S1 and Tables S2-S4 in Data S1.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4219706&req=5

pone-0111122-g003: Function-based benchmark tests do not separate between false- and true- assignments.Consensus functional annotations were determined based on (i) gene order (number of neighbor gene families that are conserved across RefOG members), (ii) protein domain content (number of protein domains that are conserved across RefOG members) and (iii) enzymatic activity (number of EC digits that are conserved across RefOG members) for every RefOG (Material and Methods). A) The distribution of conserved features across true-, missing- and false assignments for each family are illustrated with boxplots. The upper and lower boxplot panels exemplify families where the functional feature does or does not discriminate, respectively, false and true assignment. B) Bar plots show the number of orthologous groups that would be classified as “accurately inferred” using function-based tests. C) Density plots illustrate the probability to discriminate the true-, missing- and false-assignments for every function-based test (density of mean number of the conserved features for every assignment category/RefOG). The data for all 49 RefOGs is shown in the Figures S1-S3 in File S1 and Tables S2-S4 in Data S1.

Mentions: To demonstrate how our benchmark dataset can be used to validate orthology inference and to compare its performance with function-based tests, we use eggNOG, the in-house orthology database [12]. To allow meaningful evaluation of an orthology database, the orthologs (or orthologous groups) should be inferred at the same phylogenetic level as the RefOGs. Accordingly, we mapped the members of each RefOG to the eggNOG orthologous groups for gamma-proteobacteria (gproNOGs) and classified the predicted orthologs into three different categories: 1) true assignments (orthologs have been grouped correctly), 2) false assignments (proteins included in the eggNOG orthologous group, which are not reference orthologs) and 3) missing assignments (reference orthologs which were left out in eggNOG) (Figure 2A). In total, we identified 4359, 1374, and 429 proteins in each of these categories, respectively. eggNOG correctly clusters 91% of the reference orthologs, but also accumulates a considerable number of false assignments. A closer look at our comparison reveals that almost half of the total false assignments (∼600 out of the 1374 proteins) are accumulated mainly in 5 of the 49 orthologous groups (Figure 2B, Table S2 in Data S1) In all cases, the true and false assignments share common protein domains, e.g. gproNOG00600 (corresponds to RefOG075) shows how the Glt symporter domain (Pfam: PF0316 [36]) supports the grouping of 158 proteins that we can clearly separate in our phylogenetic analysis. In other words, the protein domain content of orthologs that serves commonly as a validation test of prediction [25], [26] and as a function proxy [9], [37], would classify all five orthologous groups as correct. However, this phylogeny-based test exposes functionally related, false-positive assignments and therefore enables a more accurate database evaluation. To quantify the frequency of such cases, where function-based tests fail to correctly validate the automated orthology predictions compared to our novel phylogeny-based test, we investigated if false- and missing- assignments can be differentiated based on three functional/genomic features: 1) gene order, 2) protein domain content, and 3) Enzyme Commission (EC) numbers. We limited our comparison to these three attributes, as not all proposed tests [24]–[27] are applicable in the case of gamma-proteobacteria. We retrieved each of these features (Tables S3–S5 in Data S1) for every protein classified as true, false or missing orthologs (Material and Methods). Each function-based test works well for capturing missing assignments, but the phylogeny-based test outperforms function-based tests in identifying false assignments (Figure 3; Figures S1–S3 in File S1). It is clear that many false assignments will be considered “true” orthologs if evaluated only on the basis of these functional features, reflecting the limitations of function-based tests. A closer (manual) inspection of the false assignments (i.e. analyzing alignment quality and the phylogenetic trees built for the families) further justifies the phylogeny-based validation (data not shown). Among these three validation factors, gene order can identify more accurately the false assignments, illustrating again the need to combine orthology predictions with synteny information where available [27], [38]–[40].


A phylogeny-based benchmarking test for orthology inference reveals the limitations of function-based validation.

Trachana K, Forslund K, Larsson T, Powell S, Doerks T, von Mering C, Bork P - PLoS ONE (2014)

Function-based benchmark tests do not separate between false- and true- assignments.Consensus functional annotations were determined based on (i) gene order (number of neighbor gene families that are conserved across RefOG members), (ii) protein domain content (number of protein domains that are conserved across RefOG members) and (iii) enzymatic activity (number of EC digits that are conserved across RefOG members) for every RefOG (Material and Methods). A) The distribution of conserved features across true-, missing- and false assignments for each family are illustrated with boxplots. The upper and lower boxplot panels exemplify families where the functional feature does or does not discriminate, respectively, false and true assignment. B) Bar plots show the number of orthologous groups that would be classified as “accurately inferred” using function-based tests. C) Density plots illustrate the probability to discriminate the true-, missing- and false-assignments for every function-based test (density of mean number of the conserved features for every assignment category/RefOG). The data for all 49 RefOGs is shown in the Figures S1-S3 in File S1 and Tables S2-S4 in Data S1.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4219706&req=5

pone-0111122-g003: Function-based benchmark tests do not separate between false- and true- assignments.Consensus functional annotations were determined based on (i) gene order (number of neighbor gene families that are conserved across RefOG members), (ii) protein domain content (number of protein domains that are conserved across RefOG members) and (iii) enzymatic activity (number of EC digits that are conserved across RefOG members) for every RefOG (Material and Methods). A) The distribution of conserved features across true-, missing- and false assignments for each family are illustrated with boxplots. The upper and lower boxplot panels exemplify families where the functional feature does or does not discriminate, respectively, false and true assignment. B) Bar plots show the number of orthologous groups that would be classified as “accurately inferred” using function-based tests. C) Density plots illustrate the probability to discriminate the true-, missing- and false-assignments for every function-based test (density of mean number of the conserved features for every assignment category/RefOG). The data for all 49 RefOGs is shown in the Figures S1-S3 in File S1 and Tables S2-S4 in Data S1.
Mentions: To demonstrate how our benchmark dataset can be used to validate orthology inference and to compare its performance with function-based tests, we use eggNOG, the in-house orthology database [12]. To allow meaningful evaluation of an orthology database, the orthologs (or orthologous groups) should be inferred at the same phylogenetic level as the RefOGs. Accordingly, we mapped the members of each RefOG to the eggNOG orthologous groups for gamma-proteobacteria (gproNOGs) and classified the predicted orthologs into three different categories: 1) true assignments (orthologs have been grouped correctly), 2) false assignments (proteins included in the eggNOG orthologous group, which are not reference orthologs) and 3) missing assignments (reference orthologs which were left out in eggNOG) (Figure 2A). In total, we identified 4359, 1374, and 429 proteins in each of these categories, respectively. eggNOG correctly clusters 91% of the reference orthologs, but also accumulates a considerable number of false assignments. A closer look at our comparison reveals that almost half of the total false assignments (∼600 out of the 1374 proteins) are accumulated mainly in 5 of the 49 orthologous groups (Figure 2B, Table S2 in Data S1) In all cases, the true and false assignments share common protein domains, e.g. gproNOG00600 (corresponds to RefOG075) shows how the Glt symporter domain (Pfam: PF0316 [36]) supports the grouping of 158 proteins that we can clearly separate in our phylogenetic analysis. In other words, the protein domain content of orthologs that serves commonly as a validation test of prediction [25], [26] and as a function proxy [9], [37], would classify all five orthologous groups as correct. However, this phylogeny-based test exposes functionally related, false-positive assignments and therefore enables a more accurate database evaluation. To quantify the frequency of such cases, where function-based tests fail to correctly validate the automated orthology predictions compared to our novel phylogeny-based test, we investigated if false- and missing- assignments can be differentiated based on three functional/genomic features: 1) gene order, 2) protein domain content, and 3) Enzyme Commission (EC) numbers. We limited our comparison to these three attributes, as not all proposed tests [24]–[27] are applicable in the case of gamma-proteobacteria. We retrieved each of these features (Tables S3–S5 in Data S1) for every protein classified as true, false or missing orthologs (Material and Methods). Each function-based test works well for capturing missing assignments, but the phylogeny-based test outperforms function-based tests in identifying false assignments (Figure 3; Figures S1–S3 in File S1). It is clear that many false assignments will be considered “true” orthologs if evaluated only on the basis of these functional features, reflecting the limitations of function-based tests. A closer (manual) inspection of the false assignments (i.e. analyzing alignment quality and the phylogenetic trees built for the families) further justifies the phylogeny-based validation (data not shown). Among these three validation factors, gene order can identify more accurately the false assignments, illustrating again the need to combine orthology predictions with synteny information where available [27], [38]–[40].

Bottom Line: Herein, we used this dataset to demonstrate 1) why a manually curated, phylogeny-based dataset is more appropriate for benchmarking orthology than other popular practices and 2) how it guides database design and parameterization through careful error quantification.We also examined how our dataset can instruct the selection of a "core" species repertoire to improve detection accuracy.We conclude that including more genomes at the proper evolutionary distances can influence the overall quality of orthology detection.

View Article: PubMed Central - PubMed

Affiliation: Institute for Systems Biology, Seattle, WA, United States of America.

ABSTRACT
Accurate orthology prediction is crucial for many applications in the post-genomic era. The lack of broadly accepted benchmark tests precludes a comprehensive analysis of orthology inference. So far, functional annotation between orthologs serves as a performance proxy. However, this violates the fundamental principle of orthology as an evolutionary definition, while it is often not applicable due to limited experimental evidence for most species. Therefore, we constructed high quality "gold standard" orthologous groups that can serve as a benchmark set for orthology inference in bacterial species. Herein, we used this dataset to demonstrate 1) why a manually curated, phylogeny-based dataset is more appropriate for benchmarking orthology than other popular practices and 2) how it guides database design and parameterization through careful error quantification. More specifically, we illustrate how function-based tests often fail to identify false assignments, misjudging the true performance of orthology inference methods. We also examined how our dataset can instruct the selection of a "core" species repertoire to improve detection accuracy. We conclude that including more genomes at the proper evolutionary distances can influence the overall quality of orthology detection. The curated gene families, called Reference Orthologous Groups, are publicly available at http://eggnog.embl.de/orthobench2.

Show MeSH