Limits...
A phylogeny-based benchmarking test for orthology inference reveals the limitations of function-based validation.

Trachana K, Forslund K, Larsson T, Powell S, Doerks T, von Mering C, Bork P - PLoS ONE (2014)

Bottom Line: Herein, we used this dataset to demonstrate 1) why a manually curated, phylogeny-based dataset is more appropriate for benchmarking orthology than other popular practices and 2) how it guides database design and parameterization through careful error quantification.We also examined how our dataset can instruct the selection of a "core" species repertoire to improve detection accuracy.We conclude that including more genomes at the proper evolutionary distances can influence the overall quality of orthology detection.

View Article: PubMed Central - PubMed

Affiliation: Institute for Systems Biology, Seattle, WA, United States of America.

ABSTRACT
Accurate orthology prediction is crucial for many applications in the post-genomic era. The lack of broadly accepted benchmark tests precludes a comprehensive analysis of orthology inference. So far, functional annotation between orthologs serves as a performance proxy. However, this violates the fundamental principle of orthology as an evolutionary definition, while it is often not applicable due to limited experimental evidence for most species. Therefore, we constructed high quality "gold standard" orthologous groups that can serve as a benchmark set for orthology inference in bacterial species. Herein, we used this dataset to demonstrate 1) why a manually curated, phylogeny-based dataset is more appropriate for benchmarking orthology than other popular practices and 2) how it guides database design and parameterization through careful error quantification. More specifically, we illustrate how function-based tests often fail to identify false assignments, misjudging the true performance of orthology inference methods. We also examined how our dataset can instruct the selection of a "core" species repertoire to improve detection accuracy. We conclude that including more genomes at the proper evolutionary distances can influence the overall quality of orthology detection. The curated gene families, called Reference Orthologous Groups, are publicly available at http://eggnog.embl.de/orthobench2.

Show MeSH
Benchmarking eggNOG database.A) To evaluate the performance of the database, we map the members (p1-p6) of every reference orthologous group (i.e. RefOG100) to the predicted orthologous groups and use the orthologous group with the highest coverage (i.e. OG1). Three classes of assignments are defined using OG1 orthology predictions: True assignments (TA) are the orthologs that have been grouped correctly in the database (black box). Missing assignments (MA) are the reference orthologs that were incorrectly excluded by the method (white stripped box). False assignments (FA) are those predictions that have been grouped in OG1, but are not reference orthologs (light red box). B) The number of true, false and missing assignments for eggNOG gamma-proteobacteria-specific orthologous groups (gproNOGs) applying the aforementioned scoring scheme. C) Distribution of FA per orthologous group. Half of the orthologous groups have less than 10 false assigned proteins (< = 9), contributing in less than 10% of this error category. The red box highlights five families that contribute to the ∼50% of the FA.
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC4219706&req=5

pone-0111122-g002: Benchmarking eggNOG database.A) To evaluate the performance of the database, we map the members (p1-p6) of every reference orthologous group (i.e. RefOG100) to the predicted orthologous groups and use the orthologous group with the highest coverage (i.e. OG1). Three classes of assignments are defined using OG1 orthology predictions: True assignments (TA) are the orthologs that have been grouped correctly in the database (black box). Missing assignments (MA) are the reference orthologs that were incorrectly excluded by the method (white stripped box). False assignments (FA) are those predictions that have been grouped in OG1, but are not reference orthologs (light red box). B) The number of true, false and missing assignments for eggNOG gamma-proteobacteria-specific orthologous groups (gproNOGs) applying the aforementioned scoring scheme. C) Distribution of FA per orthologous group. Half of the orthologous groups have less than 10 false assigned proteins (< = 9), contributing in less than 10% of this error category. The red box highlights five families that contribute to the ∼50% of the FA.

Mentions: To demonstrate how our benchmark dataset can be used to validate orthology inference and to compare its performance with function-based tests, we use eggNOG, the in-house orthology database [12]. To allow meaningful evaluation of an orthology database, the orthologs (or orthologous groups) should be inferred at the same phylogenetic level as the RefOGs. Accordingly, we mapped the members of each RefOG to the eggNOG orthologous groups for gamma-proteobacteria (gproNOGs) and classified the predicted orthologs into three different categories: 1) true assignments (orthologs have been grouped correctly), 2) false assignments (proteins included in the eggNOG orthologous group, which are not reference orthologs) and 3) missing assignments (reference orthologs which were left out in eggNOG) (Figure 2A). In total, we identified 4359, 1374, and 429 proteins in each of these categories, respectively. eggNOG correctly clusters 91% of the reference orthologs, but also accumulates a considerable number of false assignments. A closer look at our comparison reveals that almost half of the total false assignments (∼600 out of the 1374 proteins) are accumulated mainly in 5 of the 49 orthologous groups (Figure 2B, Table S2 in Data S1) In all cases, the true and false assignments share common protein domains, e.g. gproNOG00600 (corresponds to RefOG075) shows how the Glt symporter domain (Pfam: PF0316 [36]) supports the grouping of 158 proteins that we can clearly separate in our phylogenetic analysis. In other words, the protein domain content of orthologs that serves commonly as a validation test of prediction [25], [26] and as a function proxy [9], [37], would classify all five orthologous groups as correct. However, this phylogeny-based test exposes functionally related, false-positive assignments and therefore enables a more accurate database evaluation. To quantify the frequency of such cases, where function-based tests fail to correctly validate the automated orthology predictions compared to our novel phylogeny-based test, we investigated if false- and missing- assignments can be differentiated based on three functional/genomic features: 1) gene order, 2) protein domain content, and 3) Enzyme Commission (EC) numbers. We limited our comparison to these three attributes, as not all proposed tests [24]–[27] are applicable in the case of gamma-proteobacteria. We retrieved each of these features (Tables S3–S5 in Data S1) for every protein classified as true, false or missing orthologs (Material and Methods). Each function-based test works well for capturing missing assignments, but the phylogeny-based test outperforms function-based tests in identifying false assignments (Figure 3; Figures S1–S3 in File S1). It is clear that many false assignments will be considered “true” orthologs if evaluated only on the basis of these functional features, reflecting the limitations of function-based tests. A closer (manual) inspection of the false assignments (i.e. analyzing alignment quality and the phylogenetic trees built for the families) further justifies the phylogeny-based validation (data not shown). Among these three validation factors, gene order can identify more accurately the false assignments, illustrating again the need to combine orthology predictions with synteny information where available [27], [38]–[40].


A phylogeny-based benchmarking test for orthology inference reveals the limitations of function-based validation.

Trachana K, Forslund K, Larsson T, Powell S, Doerks T, von Mering C, Bork P - PLoS ONE (2014)

Benchmarking eggNOG database.A) To evaluate the performance of the database, we map the members (p1-p6) of every reference orthologous group (i.e. RefOG100) to the predicted orthologous groups and use the orthologous group with the highest coverage (i.e. OG1). Three classes of assignments are defined using OG1 orthology predictions: True assignments (TA) are the orthologs that have been grouped correctly in the database (black box). Missing assignments (MA) are the reference orthologs that were incorrectly excluded by the method (white stripped box). False assignments (FA) are those predictions that have been grouped in OG1, but are not reference orthologs (light red box). B) The number of true, false and missing assignments for eggNOG gamma-proteobacteria-specific orthologous groups (gproNOGs) applying the aforementioned scoring scheme. C) Distribution of FA per orthologous group. Half of the orthologous groups have less than 10 false assigned proteins (< = 9), contributing in less than 10% of this error category. The red box highlights five families that contribute to the ∼50% of the FA.
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC4219706&req=5

pone-0111122-g002: Benchmarking eggNOG database.A) To evaluate the performance of the database, we map the members (p1-p6) of every reference orthologous group (i.e. RefOG100) to the predicted orthologous groups and use the orthologous group with the highest coverage (i.e. OG1). Three classes of assignments are defined using OG1 orthology predictions: True assignments (TA) are the orthologs that have been grouped correctly in the database (black box). Missing assignments (MA) are the reference orthologs that were incorrectly excluded by the method (white stripped box). False assignments (FA) are those predictions that have been grouped in OG1, but are not reference orthologs (light red box). B) The number of true, false and missing assignments for eggNOG gamma-proteobacteria-specific orthologous groups (gproNOGs) applying the aforementioned scoring scheme. C) Distribution of FA per orthologous group. Half of the orthologous groups have less than 10 false assigned proteins (< = 9), contributing in less than 10% of this error category. The red box highlights five families that contribute to the ∼50% of the FA.
Mentions: To demonstrate how our benchmark dataset can be used to validate orthology inference and to compare its performance with function-based tests, we use eggNOG, the in-house orthology database [12]. To allow meaningful evaluation of an orthology database, the orthologs (or orthologous groups) should be inferred at the same phylogenetic level as the RefOGs. Accordingly, we mapped the members of each RefOG to the eggNOG orthologous groups for gamma-proteobacteria (gproNOGs) and classified the predicted orthologs into three different categories: 1) true assignments (orthologs have been grouped correctly), 2) false assignments (proteins included in the eggNOG orthologous group, which are not reference orthologs) and 3) missing assignments (reference orthologs which were left out in eggNOG) (Figure 2A). In total, we identified 4359, 1374, and 429 proteins in each of these categories, respectively. eggNOG correctly clusters 91% of the reference orthologs, but also accumulates a considerable number of false assignments. A closer look at our comparison reveals that almost half of the total false assignments (∼600 out of the 1374 proteins) are accumulated mainly in 5 of the 49 orthologous groups (Figure 2B, Table S2 in Data S1) In all cases, the true and false assignments share common protein domains, e.g. gproNOG00600 (corresponds to RefOG075) shows how the Glt symporter domain (Pfam: PF0316 [36]) supports the grouping of 158 proteins that we can clearly separate in our phylogenetic analysis. In other words, the protein domain content of orthologs that serves commonly as a validation test of prediction [25], [26] and as a function proxy [9], [37], would classify all five orthologous groups as correct. However, this phylogeny-based test exposes functionally related, false-positive assignments and therefore enables a more accurate database evaluation. To quantify the frequency of such cases, where function-based tests fail to correctly validate the automated orthology predictions compared to our novel phylogeny-based test, we investigated if false- and missing- assignments can be differentiated based on three functional/genomic features: 1) gene order, 2) protein domain content, and 3) Enzyme Commission (EC) numbers. We limited our comparison to these three attributes, as not all proposed tests [24]–[27] are applicable in the case of gamma-proteobacteria. We retrieved each of these features (Tables S3–S5 in Data S1) for every protein classified as true, false or missing orthologs (Material and Methods). Each function-based test works well for capturing missing assignments, but the phylogeny-based test outperforms function-based tests in identifying false assignments (Figure 3; Figures S1–S3 in File S1). It is clear that many false assignments will be considered “true” orthologs if evaluated only on the basis of these functional features, reflecting the limitations of function-based tests. A closer (manual) inspection of the false assignments (i.e. analyzing alignment quality and the phylogenetic trees built for the families) further justifies the phylogeny-based validation (data not shown). Among these three validation factors, gene order can identify more accurately the false assignments, illustrating again the need to combine orthology predictions with synteny information where available [27], [38]–[40].

Bottom Line: Herein, we used this dataset to demonstrate 1) why a manually curated, phylogeny-based dataset is more appropriate for benchmarking orthology than other popular practices and 2) how it guides database design and parameterization through careful error quantification.We also examined how our dataset can instruct the selection of a "core" species repertoire to improve detection accuracy.We conclude that including more genomes at the proper evolutionary distances can influence the overall quality of orthology detection.

View Article: PubMed Central - PubMed

Affiliation: Institute for Systems Biology, Seattle, WA, United States of America.

ABSTRACT
Accurate orthology prediction is crucial for many applications in the post-genomic era. The lack of broadly accepted benchmark tests precludes a comprehensive analysis of orthology inference. So far, functional annotation between orthologs serves as a performance proxy. However, this violates the fundamental principle of orthology as an evolutionary definition, while it is often not applicable due to limited experimental evidence for most species. Therefore, we constructed high quality "gold standard" orthologous groups that can serve as a benchmark set for orthology inference in bacterial species. Herein, we used this dataset to demonstrate 1) why a manually curated, phylogeny-based dataset is more appropriate for benchmarking orthology than other popular practices and 2) how it guides database design and parameterization through careful error quantification. More specifically, we illustrate how function-based tests often fail to identify false assignments, misjudging the true performance of orthology inference methods. We also examined how our dataset can instruct the selection of a "core" species repertoire to improve detection accuracy. We conclude that including more genomes at the proper evolutionary distances can influence the overall quality of orthology detection. The curated gene families, called Reference Orthologous Groups, are publicly available at http://eggnog.embl.de/orthobench2.

Show MeSH