Limits...
Ab initio modeling of small proteins by iterative TASSER simulations.

Wu S, Skolnick J, Zhang Y - BMC Biol. (2007)

Bottom Line: The overall result was comparable with the all-atomic ROSETTA simulation, but the central processing unit (CPU) time by I-TASSER was much shorter (150 CPU days vs. 5 CPU hours).Compared with other ab initio modeling methods such as ROSETTA and TOUCHSTONE II, the average performance of I-TASSER is either much better or is similar within a lower computational time.These data, together with the significant performance of automated I-TASSER server (the Zhang-Server) in the 'free modeling' section of the recent Critical Assessment of Structure Prediction (CASP)7 experiment, demonstrate new progresses in automated ab initio model generation.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, Lawrence, KS 66047, USA. stwu@ku.edu

ABSTRACT

Background: Predicting 3-dimensional protein structures from amino-acid sequences is an important unsolved problem in computational structural biology. The problem becomes relatively easier if close homologous proteins have been solved, as high-resolution models can be built by aligning target sequences to the solved homologous structures. However, for sequences without similar folds in the Protein Data Bank (PDB) library, the models have to be predicted from scratch. Progress in the ab initio structure modeling is slow. The aim of this study was to extend the TASSER (threading/assembly/refinement) method for the ab initio modeling and examine systemically its ability to fold small single-domain proteins.

Results: We developed I-TASSER by iteratively implementing the TASSER method, which is used in the folding test of three benchmarks of small proteins. First, data on 16 small proteins (< 90 residues) were used to generate I-TASSER models, which had an average Calpha-root mean square deviation (RMSD) of 3.8A, with 6 of them having a Calpha-RMSD < 2.5A. The overall result was comparable with the all-atomic ROSETTA simulation, but the central processing unit (CPU) time by I-TASSER was much shorter (150 CPU days vs. 5 CPU hours). Second, data on 20 small proteins (< 120 residues) were used. I-TASSER folded four of them with a Calpha-RMSD < 2.5A. The average Calpha-RMSD of the I-TASSER models was 3.9A, whereas it was 5.9A using TOUCHSTONE-II software. Finally, 20 non-homologous small proteins (< 120 residues) were taken from the PDB library. An average Calpha-RMSD of 3.9A was obtained for the third benchmark, with seven cases having a Calpha-RMSD < 2.5A.

Conclusion: Our simulation results show that I-TASSER can consistently predict the correct folds and sometimes high-resolution models for small single-domain proteins. Compared with other ab initio modeling methods such as ROSETTA and TOUCHSTONE II, the average performance of I-TASSER is either much better or is similar within a lower computational time. These data, together with the significant performance of automated I-TASSER server (the Zhang-Server) in the 'free modeling' section of the recent Critical Assessment of Structure Prediction (CASP)7 experiment, demonstrate new progresses in automated ab initio model generation. The I-TASSER server is freely available for academic users http://zhang.bioinformatics.ku.edu/I-TASSER.

Show MeSH

Related in: MedlinePlus

Examples of I-TASSER models from three independent benchmark sets. The green color is for I-TASSER models and blue for the native structures. (A–C) are from benchmark I (Bradley et al [13]); (D–F) are from benchmark II (Zhang et al [12]); and (G–I) are from benchmark III, selected directly from the PDB library. Column 1 contains the high-resolution models with a Cα-RMSD ≤ 1.5Å; column 2 contains the medium-resolution models with a Cα-RMSD of 1.5–5Å; column 3 contains the low-resolution models with a Cα-RMSD > 5Å. The Cα-RMSD value for the examples are: (A) 1ogwA_ (1.1Å), (B) 1di2A_ (2.3Å), (C) 1dcjA_(10.0Å), (D) 1cy5A (1.5Å), (E) 1pgx (3.1Å), (F) 1gnuA (8.2Å), (G) 1cqkA (1.5Å), (H) 1gyvA (3.3Å), (I) 1no5A(10.5Å). The pictures were generated using PyMOL software [45].
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC1878469&req=5

Figure 2: Examples of I-TASSER models from three independent benchmark sets. The green color is for I-TASSER models and blue for the native structures. (A–C) are from benchmark I (Bradley et al [13]); (D–F) are from benchmark II (Zhang et al [12]); and (G–I) are from benchmark III, selected directly from the PDB library. Column 1 contains the high-resolution models with a Cα-RMSD ≤ 1.5Å; column 2 contains the medium-resolution models with a Cα-RMSD of 1.5–5Å; column 3 contains the low-resolution models with a Cα-RMSD > 5Å. The Cα-RMSD value for the examples are: (A) 1ogwA_ (1.1Å), (B) 1di2A_ (2.3Å), (C) 1dcjA_(10.0Å), (D) 1cy5A (1.5Å), (E) 1pgx (3.1Å), (F) 1gnuA (8.2Å), (G) 1cqkA (1.5Å), (H) 1gyvA (3.3Å), (I) 1no5A(10.5Å). The pictures were generated using PyMOL software [45].

Mentions: Table 1 shows the modeling result of I-TASSER on 16 small proteins that were used by Bradley et al [13]. This benchmark set includes 3 α proteins, 2 β proteins, and 11 αβ proteins with pairwise sequence identity < 30%. If we define a high-resolution model as that with Cα-RMSD to native ≤ 1.5Å, I-TASSER predicts high-resolution models for one target '1ogwA' (see Figure 2A for the model superimposed on the native structure). For the best of the top five clusters, most of the targets (12/16) had a medium resolution, with a Cα-RMSD of 1.5–5Å. For the remaining three targets, I-TASSER could not correctly fold the proteins. One of them (1tif_) has a long swinging tail at the C-terminal. For the other two (1dcjA_ and 1o2fB_), both having a topology of four parallel β-strands flanked by two α-helices, the imperfection of the I-TASSER force field is obviously responsible for the failure because the energy of the native structures is higher than that of the largest clusters.


Ab initio modeling of small proteins by iterative TASSER simulations.

Wu S, Skolnick J, Zhang Y - BMC Biol. (2007)

Examples of I-TASSER models from three independent benchmark sets. The green color is for I-TASSER models and blue for the native structures. (A–C) are from benchmark I (Bradley et al [13]); (D–F) are from benchmark II (Zhang et al [12]); and (G–I) are from benchmark III, selected directly from the PDB library. Column 1 contains the high-resolution models with a Cα-RMSD ≤ 1.5Å; column 2 contains the medium-resolution models with a Cα-RMSD of 1.5–5Å; column 3 contains the low-resolution models with a Cα-RMSD > 5Å. The Cα-RMSD value for the examples are: (A) 1ogwA_ (1.1Å), (B) 1di2A_ (2.3Å), (C) 1dcjA_(10.0Å), (D) 1cy5A (1.5Å), (E) 1pgx (3.1Å), (F) 1gnuA (8.2Å), (G) 1cqkA (1.5Å), (H) 1gyvA (3.3Å), (I) 1no5A(10.5Å). The pictures were generated using PyMOL software [45].
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC1878469&req=5

Figure 2: Examples of I-TASSER models from three independent benchmark sets. The green color is for I-TASSER models and blue for the native structures. (A–C) are from benchmark I (Bradley et al [13]); (D–F) are from benchmark II (Zhang et al [12]); and (G–I) are from benchmark III, selected directly from the PDB library. Column 1 contains the high-resolution models with a Cα-RMSD ≤ 1.5Å; column 2 contains the medium-resolution models with a Cα-RMSD of 1.5–5Å; column 3 contains the low-resolution models with a Cα-RMSD > 5Å. The Cα-RMSD value for the examples are: (A) 1ogwA_ (1.1Å), (B) 1di2A_ (2.3Å), (C) 1dcjA_(10.0Å), (D) 1cy5A (1.5Å), (E) 1pgx (3.1Å), (F) 1gnuA (8.2Å), (G) 1cqkA (1.5Å), (H) 1gyvA (3.3Å), (I) 1no5A(10.5Å). The pictures were generated using PyMOL software [45].
Mentions: Table 1 shows the modeling result of I-TASSER on 16 small proteins that were used by Bradley et al [13]. This benchmark set includes 3 α proteins, 2 β proteins, and 11 αβ proteins with pairwise sequence identity < 30%. If we define a high-resolution model as that with Cα-RMSD to native ≤ 1.5Å, I-TASSER predicts high-resolution models for one target '1ogwA' (see Figure 2A for the model superimposed on the native structure). For the best of the top five clusters, most of the targets (12/16) had a medium resolution, with a Cα-RMSD of 1.5–5Å. For the remaining three targets, I-TASSER could not correctly fold the proteins. One of them (1tif_) has a long swinging tail at the C-terminal. For the other two (1dcjA_ and 1o2fB_), both having a topology of four parallel β-strands flanked by two α-helices, the imperfection of the I-TASSER force field is obviously responsible for the failure because the energy of the native structures is higher than that of the largest clusters.

Bottom Line: The overall result was comparable with the all-atomic ROSETTA simulation, but the central processing unit (CPU) time by I-TASSER was much shorter (150 CPU days vs. 5 CPU hours).Compared with other ab initio modeling methods such as ROSETTA and TOUCHSTONE II, the average performance of I-TASSER is either much better or is similar within a lower computational time.These data, together with the significant performance of automated I-TASSER server (the Zhang-Server) in the 'free modeling' section of the recent Critical Assessment of Structure Prediction (CASP)7 experiment, demonstrate new progresses in automated ab initio model generation.

View Article: PubMed Central - HTML - PubMed

Affiliation: Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, Lawrence, KS 66047, USA. stwu@ku.edu

ABSTRACT

Background: Predicting 3-dimensional protein structures from amino-acid sequences is an important unsolved problem in computational structural biology. The problem becomes relatively easier if close homologous proteins have been solved, as high-resolution models can be built by aligning target sequences to the solved homologous structures. However, for sequences without similar folds in the Protein Data Bank (PDB) library, the models have to be predicted from scratch. Progress in the ab initio structure modeling is slow. The aim of this study was to extend the TASSER (threading/assembly/refinement) method for the ab initio modeling and examine systemically its ability to fold small single-domain proteins.

Results: We developed I-TASSER by iteratively implementing the TASSER method, which is used in the folding test of three benchmarks of small proteins. First, data on 16 small proteins (< 90 residues) were used to generate I-TASSER models, which had an average Calpha-root mean square deviation (RMSD) of 3.8A, with 6 of them having a Calpha-RMSD < 2.5A. The overall result was comparable with the all-atomic ROSETTA simulation, but the central processing unit (CPU) time by I-TASSER was much shorter (150 CPU days vs. 5 CPU hours). Second, data on 20 small proteins (< 120 residues) were used. I-TASSER folded four of them with a Calpha-RMSD < 2.5A. The average Calpha-RMSD of the I-TASSER models was 3.9A, whereas it was 5.9A using TOUCHSTONE-II software. Finally, 20 non-homologous small proteins (< 120 residues) were taken from the PDB library. An average Calpha-RMSD of 3.9A was obtained for the third benchmark, with seven cases having a Calpha-RMSD < 2.5A.

Conclusion: Our simulation results show that I-TASSER can consistently predict the correct folds and sometimes high-resolution models for small single-domain proteins. Compared with other ab initio modeling methods such as ROSETTA and TOUCHSTONE II, the average performance of I-TASSER is either much better or is similar within a lower computational time. These data, together with the significant performance of automated I-TASSER server (the Zhang-Server) in the 'free modeling' section of the recent Critical Assessment of Structure Prediction (CASP)7 experiment, demonstrate new progresses in automated ab initio model generation. The I-TASSER server is freely available for academic users http://zhang.bioinformatics.ku.edu/I-TASSER.

Show MeSH
Related in: MedlinePlus