Limits...
Function prediction from networks of local evolutionary similarity in protein structure.

Erdin S, Venner E, Lisewski AM, Lichtarge O - BMC Bioinformatics (2013)

Bottom Line: One proven computational strategy has been to group a few key functional amino acids into templates and search for these templates in other protein structures, so as to transfer function when a match is found.To this end, we previously developed Evolutionary Trace Annotation (ETA) and showed that diffusing known annotations over a network of template matches on a structural genomic scale improved predictions of function.We improve the accuracy and sensitivity of predictions by using multiple templates per protein structure when constructing networks of ETA matches and diffusing annotations.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA.

ABSTRACT

Background: Annotating protein function with both high accuracy and sensitivity remains a major challenge in structural genomics. One proven computational strategy has been to group a few key functional amino acids into templates and search for these templates in other protein structures, so as to transfer function when a match is found. To this end, we previously developed Evolutionary Trace Annotation (ETA) and showed that diffusing known annotations over a network of template matches on a structural genomic scale improved predictions of function. In order to further increase sensitivity, we now let each protein contribute multiple templates rather than just one, and also let the template size vary.

Results: Retrospective benchmarks in 605 Structural Genomics enzymes showed that multiple templates increased sensitivity by up to 14% when combined with single template predictions even as they maintained the accuracy over 91%. Diffusing function globally on networks of single and multiple template matches marginally increased the area under the ROC curve over 0.97, but in a subset of proteins that could not be annotated by ETA, the network approach recovered annotations for the most confident 20-23 of 91 cases with 100% accuracy.

Conclusions: We improve the accuracy and sensitivity of predictions by using multiple templates per protein structure when constructing networks of ETA matches and diffusing annotations.

Show MeSH
Graph of ETA matches between 2z04A and 3drjA based on six-residue and multiple six-residue templates. Red clusters and red templates are identified in single template mode, while brown and purple clusters, and brown templates are identified in multiple template mode.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC3584919&req=5

Figure 3: Graph of ETA matches between 2z04A and 3drjA based on six-residue and multiple six-residue templates. Red clusters and red templates are identified in single template mode, while brown and purple clusters, and brown templates are identified in multiple template mode.

Mentions: An example, in Figure 3, illustrates how ETA in multiple templates per protein mode recovered annotations missed by ETA with a single template per protein. ETA generated a six-residue template, {257N, 256H, 260H, 254R, 237E, 250E}, from the Phosphoribosylaminoimidazole carboxylase ATPase subunit from Aquifex Aeolicus (PDB 2z04; chain A) and matched it to a N5-carboxyaminoimidazole ribonucleotide synthetase from Esherichia Coli with 30% sequence identity (PDB 3etj; chain A) [31]. However, the reciprocal six-residue template in 3etjA, which is {126Y, 127D, 128G, 245N, 244H} generated from ET cluster {51E, 120K, 126Y, 127D, 128G, 226E, 237N, 238E, 242R, 244H, 245N, 305Y, 307K, 314K}, could not be matched significantly back to the query 2z04A, and as a result there was no prediction in a single template mode. In multiple template mode, however, ET identified two other subclusters in 3etjA {226, 237, 238, 242, 244, 245} (shown in brown) and 51, 120, 226, 237, 238, 242, 244, 245} (shown in purple) and ETA accordingly generated an additional reciprocal six-residue template {226E, 237N, 238E, 242R, 244H, 245N}, which did match to 2z04A and thereby led to the correct prediction.


Function prediction from networks of local evolutionary similarity in protein structure.

Erdin S, Venner E, Lisewski AM, Lichtarge O - BMC Bioinformatics (2013)

Graph of ETA matches between 2z04A and 3drjA based on six-residue and multiple six-residue templates. Red clusters and red templates are identified in single template mode, while brown and purple clusters, and brown templates are identified in multiple template mode.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC3584919&req=5

Figure 3: Graph of ETA matches between 2z04A and 3drjA based on six-residue and multiple six-residue templates. Red clusters and red templates are identified in single template mode, while brown and purple clusters, and brown templates are identified in multiple template mode.
Mentions: An example, in Figure 3, illustrates how ETA in multiple templates per protein mode recovered annotations missed by ETA with a single template per protein. ETA generated a six-residue template, {257N, 256H, 260H, 254R, 237E, 250E}, from the Phosphoribosylaminoimidazole carboxylase ATPase subunit from Aquifex Aeolicus (PDB 2z04; chain A) and matched it to a N5-carboxyaminoimidazole ribonucleotide synthetase from Esherichia Coli with 30% sequence identity (PDB 3etj; chain A) [31]. However, the reciprocal six-residue template in 3etjA, which is {126Y, 127D, 128G, 245N, 244H} generated from ET cluster {51E, 120K, 126Y, 127D, 128G, 226E, 237N, 238E, 242R, 244H, 245N, 305Y, 307K, 314K}, could not be matched significantly back to the query 2z04A, and as a result there was no prediction in a single template mode. In multiple template mode, however, ET identified two other subclusters in 3etjA {226, 237, 238, 242, 244, 245} (shown in brown) and 51, 120, 226, 237, 238, 242, 244, 245} (shown in purple) and ETA accordingly generated an additional reciprocal six-residue template {226E, 237N, 238E, 242R, 244H, 245N}, which did match to 2z04A and thereby led to the correct prediction.

Bottom Line: One proven computational strategy has been to group a few key functional amino acids into templates and search for these templates in other protein structures, so as to transfer function when a match is found.To this end, we previously developed Evolutionary Trace Annotation (ETA) and showed that diffusing known annotations over a network of template matches on a structural genomic scale improved predictions of function.We improve the accuracy and sensitivity of predictions by using multiple templates per protein structure when constructing networks of ETA matches and diffusing annotations.

View Article: PubMed Central - HTML - PubMed

Affiliation: Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA.

ABSTRACT

Background: Annotating protein function with both high accuracy and sensitivity remains a major challenge in structural genomics. One proven computational strategy has been to group a few key functional amino acids into templates and search for these templates in other protein structures, so as to transfer function when a match is found. To this end, we previously developed Evolutionary Trace Annotation (ETA) and showed that diffusing known annotations over a network of template matches on a structural genomic scale improved predictions of function. In order to further increase sensitivity, we now let each protein contribute multiple templates rather than just one, and also let the template size vary.

Results: Retrospective benchmarks in 605 Structural Genomics enzymes showed that multiple templates increased sensitivity by up to 14% when combined with single template predictions even as they maintained the accuracy over 91%. Diffusing function globally on networks of single and multiple template matches marginally increased the area under the ROC curve over 0.97, but in a subset of proteins that could not be annotated by ETA, the network approach recovered annotations for the most confident 20-23 of 91 cases with 100% accuracy.

Conclusions: We improve the accuracy and sensitivity of predictions by using multiple templates per protein structure when constructing networks of ETA matches and diffusing annotations.

Show MeSH