Limits...
Function prediction from networks of local evolutionary similarity in protein structure.

Erdin S, Venner E, Lisewski AM, Lichtarge O - BMC Bioinformatics (2013)

Bottom Line: One proven computational strategy has been to group a few key functional amino acids into templates and search for these templates in other protein structures, so as to transfer function when a match is found.To this end, we previously developed Evolutionary Trace Annotation (ETA) and showed that diffusing known annotations over a network of template matches on a structural genomic scale improved predictions of function.We improve the accuracy and sensitivity of predictions by using multiple templates per protein structure when constructing networks of ETA matches and diffusing annotations.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA.

ABSTRACT

Background: Annotating protein function with both high accuracy and sensitivity remains a major challenge in structural genomics. One proven computational strategy has been to group a few key functional amino acids into templates and search for these templates in other protein structures, so as to transfer function when a match is found. To this end, we previously developed Evolutionary Trace Annotation (ETA) and showed that diffusing known annotations over a network of template matches on a structural genomic scale improved predictions of function. In order to further increase sensitivity, we now let each protein contribute multiple templates rather than just one, and also let the template size vary.

Results: Retrospective benchmarks in 605 Structural Genomics enzymes showed that multiple templates increased sensitivity by up to 14% when combined with single template predictions even as they maintained the accuracy over 91%. Diffusing function globally on networks of single and multiple template matches marginally increased the area under the ROC curve over 0.97, but in a subset of proteins that could not be annotated by ETA, the network approach recovered annotations for the most confident 20-23 of 91 cases with 100% accuracy.

Conclusions: We improve the accuracy and sensitivity of predictions by using multiple templates per protein structure when constructing networks of ETA matches and diffusing annotations.

Show MeSH
Accuracy versus sensitivity graph of network diffusion method for four different ETA network for (A) 605 Structural Genomics enzymes; and for (B) 73 non-trivial Structural Genomics enzymes. The numbers inside the parentheses show the area under curve (AUC) for each curve. Dashed line indicates the performance of network diffusion method based on multiple six-residue templates.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3584919&req=5

Figure 4: Accuracy versus sensitivity graph of network diffusion method for four different ETA network for (A) 605 Structural Genomics enzymes; and for (B) 73 non-trivial Structural Genomics enzymes. The numbers inside the parentheses show the area under curve (AUC) for each curve. Dashed line indicates the performance of network diffusion method based on multiple six-residue templates.

Mentions: In order to further assess performance, we constructed networks of ETA matches from each of the modes of ETA described above. Competitive diffusion was then carried out as described previously [27] in order to draw annotation from the global distribution of all matches among all query proteins and all proteins with known functions. The results suggest that the predictive power of the network makes up for the disadvantages of each individual method since all of the template methods perform nearly equally well. In the 605 protein benchmark test set, the area under the accuracy-sensitivity receiver operator curves were essentially identical at 0.971, 0.968, 0.965, and 0.965 for multiple six-residue templates, five-residue templates, six-residue templates, and multiple five-residue templates, respectively (see Figure 4A). In more detail, however, some slight differences emerge. At 95% accuracy, the network built from multiple six-residue templates per protein has 4% better sensitivity of (84 vs 80%) over six-residue single template networks, accounting for 24 additional true positives. This improvement was also observed in the benchmark of 73 proteins with less than 30% sequence identity to any true annotated matching protein, as shown in Figure 4B.


Function prediction from networks of local evolutionary similarity in protein structure.

Erdin S, Venner E, Lisewski AM, Lichtarge O - BMC Bioinformatics (2013)

Accuracy versus sensitivity graph of network diffusion method for four different ETA network for (A) 605 Structural Genomics enzymes; and for (B) 73 non-trivial Structural Genomics enzymes. The numbers inside the parentheses show the area under curve (AUC) for each curve. Dashed line indicates the performance of network diffusion method based on multiple six-residue templates.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3584919&req=5

Figure 4: Accuracy versus sensitivity graph of network diffusion method for four different ETA network for (A) 605 Structural Genomics enzymes; and for (B) 73 non-trivial Structural Genomics enzymes. The numbers inside the parentheses show the area under curve (AUC) for each curve. Dashed line indicates the performance of network diffusion method based on multiple six-residue templates.
Mentions: In order to further assess performance, we constructed networks of ETA matches from each of the modes of ETA described above. Competitive diffusion was then carried out as described previously [27] in order to draw annotation from the global distribution of all matches among all query proteins and all proteins with known functions. The results suggest that the predictive power of the network makes up for the disadvantages of each individual method since all of the template methods perform nearly equally well. In the 605 protein benchmark test set, the area under the accuracy-sensitivity receiver operator curves were essentially identical at 0.971, 0.968, 0.965, and 0.965 for multiple six-residue templates, five-residue templates, six-residue templates, and multiple five-residue templates, respectively (see Figure 4A). In more detail, however, some slight differences emerge. At 95% accuracy, the network built from multiple six-residue templates per protein has 4% better sensitivity of (84 vs 80%) over six-residue single template networks, accounting for 24 additional true positives. This improvement was also observed in the benchmark of 73 proteins with less than 30% sequence identity to any true annotated matching protein, as shown in Figure 4B.

Bottom Line: One proven computational strategy has been to group a few key functional amino acids into templates and search for these templates in other protein structures, so as to transfer function when a match is found.To this end, we previously developed Evolutionary Trace Annotation (ETA) and showed that diffusing known annotations over a network of template matches on a structural genomic scale improved predictions of function.We improve the accuracy and sensitivity of predictions by using multiple templates per protein structure when constructing networks of ETA matches and diffusing annotations.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA.

ABSTRACT

Background: Annotating protein function with both high accuracy and sensitivity remains a major challenge in structural genomics. One proven computational strategy has been to group a few key functional amino acids into templates and search for these templates in other protein structures, so as to transfer function when a match is found. To this end, we previously developed Evolutionary Trace Annotation (ETA) and showed that diffusing known annotations over a network of template matches on a structural genomic scale improved predictions of function. In order to further increase sensitivity, we now let each protein contribute multiple templates rather than just one, and also let the template size vary.

Results: Retrospective benchmarks in 605 Structural Genomics enzymes showed that multiple templates increased sensitivity by up to 14% when combined with single template predictions even as they maintained the accuracy over 91%. Diffusing function globally on networks of single and multiple template matches marginally increased the area under the ROC curve over 0.97, but in a subset of proteins that could not be annotated by ETA, the network approach recovered annotations for the most confident 20-23 of 91 cases with 100% accuracy.

Conclusions: We improve the accuracy and sensitivity of predictions by using multiple templates per protein structure when constructing networks of ETA matches and diffusing annotations.

Show MeSH