Limits...
Superfamily assignments for the yeast proteome through integration of structure prediction with the gene ontology.

Malmström L, Riffle M, Strauss CE, Chivian D, Davis TN, Bonneau R, Baker D - PLoS Biol. (2007)

Bottom Line: Saccharomyces cerevisiae is one of the best-studied model organisms, yet the three-dimensional structure and molecular function of many yeast proteins remain unknown.This structural data was integrated with process, component, and function annotations from the Saccharomyces Genome Database to assign yeast protein domains to SCOP superfamilies using a simple Bayesian approach.We have predicted the structure of 3,338 putative domains and assigned SCOP superfamily annotations to 581 of them.

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry, University of Washington, Seattle, Washington, United States of America.

ABSTRACT
Saccharomyces cerevisiae is one of the best-studied model organisms, yet the three-dimensional structure and molecular function of many yeast proteins remain unknown. Yeast proteins were parsed into 14,934 domains, and those lacking sequence similarity to proteins of known structure were folded using the Rosetta de novo structure prediction method on the World Community Grid. This structural data was integrated with process, component, and function annotations from the Saccharomyces Genome Database to assign yeast protein domains to SCOP superfamilies using a simple Bayesian approach. We have predicted the structure of 3,338 putative domains and assigned SCOP superfamily annotations to 581 of them. We have also assigned structural annotations to 7,094 predicted domains based on fold recognition and homology modeling methods. The domain predictions and structural information are available in an online database at http://rd.plos.org/10.1371_journal.pbio.0050076_01.

Show MeSH

Related in: MedlinePlus

Integration of Structure Prediction with GO AnnotationsRed line represents the superfamily distribution for the predicted structures, P(SF/D); blue line, the superfamily distribution based on GO annotations, P(SF/GO). Black line represents the Bayesian combination (P(SF/D,GO); Equation 2). Only superfamilies with a probability over 0.001 in either category are displayed. The names of the proteins and the GO annotations for which the black line is derived are (A) 1KMDA (Vam7p Px Domain)/Golgi to vacuole transport (process), (B) 1IOUA (v-SNARE)/vesicle fusion (process), (C) 1F32 (Ascaris pepsin inhibitor-3)/endopeptidase inhibitor activity (function), and (D) 1DUJA (Spindle Assembly Checkpoint protein Human Mad2)/Chromosome (component).
© Copyright Policy
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1828141&req=5

pbio-0050076-g002: Integration of Structure Prediction with GO AnnotationsRed line represents the superfamily distribution for the predicted structures, P(SF/D); blue line, the superfamily distribution based on GO annotations, P(SF/GO). Black line represents the Bayesian combination (P(SF/D,GO); Equation 2). Only superfamilies with a probability over 0.001 in either category are displayed. The names of the proteins and the GO annotations for which the black line is derived are (A) 1KMDA (Vam7p Px Domain)/Golgi to vacuole transport (process), (B) 1IOUA (v-SNARE)/vesicle fusion (process), (C) 1F32 (Ascaris pepsin inhibitor-3)/endopeptidase inhibitor activity (function), and (D) 1DUJA (Spindle Assembly Checkpoint protein Human Mad2)/Chromosome (component).

Mentions: The superfamily distributions derived from the structure prediction data alone (P(SF/D)), the GO annotations (P(SF/GO)), and from the two together (P(SF/D,GO)), are compared in Figure 2 for four proteins for which the true SCOP superfamilies are known, showing the synergy between the two sources of information. The ambiguities in P(SF/D) (red line) and P(SF/GO) (blue line) are reduced upon integration P(SF/D,GO) (black line), resulting in less ambiguous predictions for many difficult-to-annotate domains. The overall performance for the P(SF/D,GO) over the benchmark set (see Materials and Methods) is shown in Figure 3. A total of 177 yeast domains (see Table 2) were assigned a structural superfamily with a P(SF/D,GO) over 0.8 (Table S3).


Superfamily assignments for the yeast proteome through integration of structure prediction with the gene ontology.

Malmström L, Riffle M, Strauss CE, Chivian D, Davis TN, Bonneau R, Baker D - PLoS Biol. (2007)

Integration of Structure Prediction with GO AnnotationsRed line represents the superfamily distribution for the predicted structures, P(SF/D); blue line, the superfamily distribution based on GO annotations, P(SF/GO). Black line represents the Bayesian combination (P(SF/D,GO); Equation 2). Only superfamilies with a probability over 0.001 in either category are displayed. The names of the proteins and the GO annotations for which the black line is derived are (A) 1KMDA (Vam7p Px Domain)/Golgi to vacuole transport (process), (B) 1IOUA (v-SNARE)/vesicle fusion (process), (C) 1F32 (Ascaris pepsin inhibitor-3)/endopeptidase inhibitor activity (function), and (D) 1DUJA (Spindle Assembly Checkpoint protein Human Mad2)/Chromosome (component).
© Copyright Policy
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1828141&req=5

pbio-0050076-g002: Integration of Structure Prediction with GO AnnotationsRed line represents the superfamily distribution for the predicted structures, P(SF/D); blue line, the superfamily distribution based on GO annotations, P(SF/GO). Black line represents the Bayesian combination (P(SF/D,GO); Equation 2). Only superfamilies with a probability over 0.001 in either category are displayed. The names of the proteins and the GO annotations for which the black line is derived are (A) 1KMDA (Vam7p Px Domain)/Golgi to vacuole transport (process), (B) 1IOUA (v-SNARE)/vesicle fusion (process), (C) 1F32 (Ascaris pepsin inhibitor-3)/endopeptidase inhibitor activity (function), and (D) 1DUJA (Spindle Assembly Checkpoint protein Human Mad2)/Chromosome (component).
Mentions: The superfamily distributions derived from the structure prediction data alone (P(SF/D)), the GO annotations (P(SF/GO)), and from the two together (P(SF/D,GO)), are compared in Figure 2 for four proteins for which the true SCOP superfamilies are known, showing the synergy between the two sources of information. The ambiguities in P(SF/D) (red line) and P(SF/GO) (blue line) are reduced upon integration P(SF/D,GO) (black line), resulting in less ambiguous predictions for many difficult-to-annotate domains. The overall performance for the P(SF/D,GO) over the benchmark set (see Materials and Methods) is shown in Figure 3. A total of 177 yeast domains (see Table 2) were assigned a structural superfamily with a P(SF/D,GO) over 0.8 (Table S3).

Bottom Line: Saccharomyces cerevisiae is one of the best-studied model organisms, yet the three-dimensional structure and molecular function of many yeast proteins remain unknown.This structural data was integrated with process, component, and function annotations from the Saccharomyces Genome Database to assign yeast protein domains to SCOP superfamilies using a simple Bayesian approach.We have predicted the structure of 3,338 putative domains and assigned SCOP superfamily annotations to 581 of them.

View Article: PubMed Central - PubMed

Affiliation: Department of Biochemistry, University of Washington, Seattle, Washington, United States of America.

ABSTRACT
Saccharomyces cerevisiae is one of the best-studied model organisms, yet the three-dimensional structure and molecular function of many yeast proteins remain unknown. Yeast proteins were parsed into 14,934 domains, and those lacking sequence similarity to proteins of known structure were folded using the Rosetta de novo structure prediction method on the World Community Grid. This structural data was integrated with process, component, and function annotations from the Saccharomyces Genome Database to assign yeast protein domains to SCOP superfamilies using a simple Bayesian approach. We have predicted the structure of 3,338 putative domains and assigned SCOP superfamily annotations to 581 of them. We have also assigned structural annotations to 7,094 predicted domains based on fold recognition and homology modeling methods. The domain predictions and structural information are available in an online database at http://rd.plos.org/10.1371_journal.pbio.0050076_01.

Show MeSH
Related in: MedlinePlus