Limits...
Function prediction from networks of local evolutionary similarity in protein structure.

Erdin S, Venner E, Lisewski AM, Lichtarge O - BMC Bioinformatics (2013)

Bottom Line: One proven computational strategy has been to group a few key functional amino acids into templates and search for these templates in other protein structures, so as to transfer function when a match is found.To this end, we previously developed Evolutionary Trace Annotation (ETA) and showed that diffusing known annotations over a network of template matches on a structural genomic scale improved predictions of function.We improve the accuracy and sensitivity of predictions by using multiple templates per protein structure when constructing networks of ETA matches and diffusing annotations.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA.

ABSTRACT

Background: Annotating protein function with both high accuracy and sensitivity remains a major challenge in structural genomics. One proven computational strategy has been to group a few key functional amino acids into templates and search for these templates in other protein structures, so as to transfer function when a match is found. To this end, we previously developed Evolutionary Trace Annotation (ETA) and showed that diffusing known annotations over a network of template matches on a structural genomic scale improved predictions of function. In order to further increase sensitivity, we now let each protein contribute multiple templates rather than just one, and also let the template size vary.

Results: Retrospective benchmarks in 605 Structural Genomics enzymes showed that multiple templates increased sensitivity by up to 14% when combined with single template predictions even as they maintained the accuracy over 91%. Diffusing function globally on networks of single and multiple template matches marginally increased the area under the ROC curve over 0.97, but in a subset of proteins that could not be annotated by ETA, the network approach recovered annotations for the most confident 20-23 of 91 cases with 100% accuracy.

Conclusions: We improve the accuracy and sensitivity of predictions by using multiple templates per protein structure when constructing networks of ETA matches and diffusing annotations.

Show MeSH
Accuracy, sensitivity and F-measure of ETA with four template selection methods, six-residue (6R), five-residue (5R), multiple six-residue (M6R), multiple five-residue (M5R), combination of four ETA template selecting modes (ALL) and a sequence-based annotation based on sequence identity (SeqID) for (A) a test set of 605 Structural Genomics enzymes; and for (B) a non-trivial test set of 73 Structural Genomics enzymes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3584919&req=5

Figure 2: Accuracy, sensitivity and F-measure of ETA with four template selection methods, six-residue (6R), five-residue (5R), multiple six-residue (M6R), multiple five-residue (M5R), combination of four ETA template selecting modes (ALL) and a sequence-based annotation based on sequence identity (SeqID) for (A) a test set of 605 Structural Genomics enzymes; and for (B) a non-trivial test set of 73 Structural Genomics enzymes.

Mentions: We wish to benchmark ETA's performance across four possible modes: the first selects matches based on one five-residue template per protein (5RT); the second relies instead on one six-residue template per protein (6RT); the third uses multiple five-residue templates per protein (M5RT); and the last uses multiple six-residue templates per protein (M6RT). In each mode, ETA was applied on a set of 605 enzymes with full-EC annotations (see Methods). Performance of each template selection mode and sequencebased strategy was measured in terms of accuracy, sensitivity and by the weighted mean of both with the F-measure (see Methods). The results, depicted in Figure 2A, showed that while single-templates were more accurate, multiple templates were more sensitive. Overall, it was the multiple six-residue templates that yielded the highest F-measure performance, suggesting this is the method of choice.


Function prediction from networks of local evolutionary similarity in protein structure.

Erdin S, Venner E, Lisewski AM, Lichtarge O - BMC Bioinformatics (2013)

Accuracy, sensitivity and F-measure of ETA with four template selection methods, six-residue (6R), five-residue (5R), multiple six-residue (M6R), multiple five-residue (M5R), combination of four ETA template selecting modes (ALL) and a sequence-based annotation based on sequence identity (SeqID) for (A) a test set of 605 Structural Genomics enzymes; and for (B) a non-trivial test set of 73 Structural Genomics enzymes.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3584919&req=5

Figure 2: Accuracy, sensitivity and F-measure of ETA with four template selection methods, six-residue (6R), five-residue (5R), multiple six-residue (M6R), multiple five-residue (M5R), combination of four ETA template selecting modes (ALL) and a sequence-based annotation based on sequence identity (SeqID) for (A) a test set of 605 Structural Genomics enzymes; and for (B) a non-trivial test set of 73 Structural Genomics enzymes.
Mentions: We wish to benchmark ETA's performance across four possible modes: the first selects matches based on one five-residue template per protein (5RT); the second relies instead on one six-residue template per protein (6RT); the third uses multiple five-residue templates per protein (M5RT); and the last uses multiple six-residue templates per protein (M6RT). In each mode, ETA was applied on a set of 605 enzymes with full-EC annotations (see Methods). Performance of each template selection mode and sequencebased strategy was measured in terms of accuracy, sensitivity and by the weighted mean of both with the F-measure (see Methods). The results, depicted in Figure 2A, showed that while single-templates were more accurate, multiple templates were more sensitive. Overall, it was the multiple six-residue templates that yielded the highest F-measure performance, suggesting this is the method of choice.

Bottom Line: One proven computational strategy has been to group a few key functional amino acids into templates and search for these templates in other protein structures, so as to transfer function when a match is found.To this end, we previously developed Evolutionary Trace Annotation (ETA) and showed that diffusing known annotations over a network of template matches on a structural genomic scale improved predictions of function.We improve the accuracy and sensitivity of predictions by using multiple templates per protein structure when constructing networks of ETA matches and diffusing annotations.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA.

ABSTRACT

Background: Annotating protein function with both high accuracy and sensitivity remains a major challenge in structural genomics. One proven computational strategy has been to group a few key functional amino acids into templates and search for these templates in other protein structures, so as to transfer function when a match is found. To this end, we previously developed Evolutionary Trace Annotation (ETA) and showed that diffusing known annotations over a network of template matches on a structural genomic scale improved predictions of function. In order to further increase sensitivity, we now let each protein contribute multiple templates rather than just one, and also let the template size vary.

Results: Retrospective benchmarks in 605 Structural Genomics enzymes showed that multiple templates increased sensitivity by up to 14% when combined with single template predictions even as they maintained the accuracy over 91%. Diffusing function globally on networks of single and multiple template matches marginally increased the area under the ROC curve over 0.97, but in a subset of proteins that could not be annotated by ETA, the network approach recovered annotations for the most confident 20-23 of 91 cases with 100% accuracy.

Conclusions: We improve the accuracy and sensitivity of predictions by using multiple templates per protein structure when constructing networks of ETA matches and diffusing annotations.

Show MeSH