Limits...
TIM-Finder: a new method for identifying TIM-barrel proteins.

Si JN, Yan RX, Wang C, Zhang Z, Su XD - BMC Struct. Biol. (2009)

Bottom Line: The triosephosphate isomerase (TIM)-barrel fold occurs frequently in the proteomes of different organisms, and the known TIM-barrel proteins have been found to play diverse functional roles.With the assistance of Support Vector Machine (SVM), the three descriptors were combined to obtain a new method with improved performance, which we call TIM-Finder.TIM-Finder can serve as a competitive tool for proteome-wide TIM-barrel protein identification.

View Article: PubMed Central - HTML - PubMed

Affiliation: State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China. sijingna@gmail.com

ABSTRACT

Background: The triosephosphate isomerase (TIM)-barrel fold occurs frequently in the proteomes of different organisms, and the known TIM-barrel proteins have been found to play diverse functional roles. To accelerate the exploration of the sequence-structure protein landscape in the TIM-barrel fold, a computational tool that allows sensitive detection of TIM-barrel proteins is required.

Results: To develop a new TIM-barrel protein identification method in this work, we consider three descriptors: a sequence-alignment-based descriptor using PSI-BLAST e-values and bit scores, a descriptor based on secondary structure element alignment (SSEA), and a descriptor based on the occurrence of PROSITE functional motifs. With the assistance of Support Vector Machine (SVM), the three descriptors were combined to obtain a new method with improved performance, which we call TIM-Finder. When tested on the whole proteome of Bacillus subtilis, TIM-Finder is able to detect 194 TIM-barrel proteins at a 99% confidence level, outperforming the PSI-BLAST search as well as one existing fold recognition method.

Conclusions: TIM-Finder can serve as a competitive tool for proteome-wide TIM-barrel protein identification. The TIM-Finder web server is freely accessible at http://202.112.170.199/TIM-Finder/.

Show MeSH
Cartoon representation of a TIM-barrel protein (PDB entry: 1n55). The structural location of the most frequently occurred PROSITE motif (entry: PS00171, pattern: [AVG]- [YLV]-E-P- [LIVMEPKST]- [WYEAS]- [SAL]- [IV]- [GN]- [TEKDVS]- [GKNAD]) in the 3D model is shown in magenta.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2803183&req=5

Figure 4: Cartoon representation of a TIM-barrel protein (PDB entry: 1n55). The structural location of the most frequently occurred PROSITE motif (entry: PS00171, pattern: [AVG]- [YLV]-E-P- [LIVMEPKST]- [WYEAS]- [SAL]- [IV]- [GN]- [TEKDVS]- [GKNAD]) in the 3D model is shown in magenta.

Mentions: The motif-based descriptor leads to an AUC value of 0.792, which is less impressive than the PSI-BLAST- and SSEA-based descriptors (Figure 2). At a ≤ 5% FPR control, the motif-based descriptor only correctly recognizes 46.0% of the TIM-barrel proteins. Sequence motifs have been reported to correlate with protein folds [25,32]. The central idea of the motif-based descriptor is to recognize TIM-barrel proteins based on motif-fold compatibility. In this work, we used the PROSITE database, because it is one of the most widely used and comprehensive sequence motif databases. The PROSITE motifs are mainly defined as patterns (i.e., regular expressions) and profiles, which were derived from analysis of sequences of known function. For each PROSITE motif, its compatibility with the TIM-barrel fold was measured by a score called S(TIM/motif). Of the 2096 motifs under investigation, 103 have S(TIM/motif)> 0.1, including 91 patterns and 12 profiles. As an illustrative example, we have provided the 3D model for a TIM-barrel protein and the structural location of a PROSITE motif PS00171 (Figure 4), which was analyzed as having the highest S(TIM/motif) score. Due to the functional diversity of TIM-barrel proteins, the PROSITE motifs are obviously enriched in this fold. Therefore, the motif-based descriptor, which represents local sequence features of proteins, should be particularly suitable for recognizing TIM-barrel proteins. Additionally, the motif-based descriptor is alignment independent, meaning that it should be complementary to the other two alignment related descriptors (i.e., the PSI-BLAST- and SSEA- based descriptors). Thus, it should be informative when combined with the other two descriptors, although the motif-based descriptor itself is not powerful.


TIM-Finder: a new method for identifying TIM-barrel proteins.

Si JN, Yan RX, Wang C, Zhang Z, Su XD - BMC Struct. Biol. (2009)

Cartoon representation of a TIM-barrel protein (PDB entry: 1n55). The structural location of the most frequently occurred PROSITE motif (entry: PS00171, pattern: [AVG]- [YLV]-E-P- [LIVMEPKST]- [WYEAS]- [SAL]- [IV]- [GN]- [TEKDVS]- [GKNAD]) in the 3D model is shown in magenta.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2803183&req=5

Figure 4: Cartoon representation of a TIM-barrel protein (PDB entry: 1n55). The structural location of the most frequently occurred PROSITE motif (entry: PS00171, pattern: [AVG]- [YLV]-E-P- [LIVMEPKST]- [WYEAS]- [SAL]- [IV]- [GN]- [TEKDVS]- [GKNAD]) in the 3D model is shown in magenta.
Mentions: The motif-based descriptor leads to an AUC value of 0.792, which is less impressive than the PSI-BLAST- and SSEA-based descriptors (Figure 2). At a ≤ 5% FPR control, the motif-based descriptor only correctly recognizes 46.0% of the TIM-barrel proteins. Sequence motifs have been reported to correlate with protein folds [25,32]. The central idea of the motif-based descriptor is to recognize TIM-barrel proteins based on motif-fold compatibility. In this work, we used the PROSITE database, because it is one of the most widely used and comprehensive sequence motif databases. The PROSITE motifs are mainly defined as patterns (i.e., regular expressions) and profiles, which were derived from analysis of sequences of known function. For each PROSITE motif, its compatibility with the TIM-barrel fold was measured by a score called S(TIM/motif). Of the 2096 motifs under investigation, 103 have S(TIM/motif)> 0.1, including 91 patterns and 12 profiles. As an illustrative example, we have provided the 3D model for a TIM-barrel protein and the structural location of a PROSITE motif PS00171 (Figure 4), which was analyzed as having the highest S(TIM/motif) score. Due to the functional diversity of TIM-barrel proteins, the PROSITE motifs are obviously enriched in this fold. Therefore, the motif-based descriptor, which represents local sequence features of proteins, should be particularly suitable for recognizing TIM-barrel proteins. Additionally, the motif-based descriptor is alignment independent, meaning that it should be complementary to the other two alignment related descriptors (i.e., the PSI-BLAST- and SSEA- based descriptors). Thus, it should be informative when combined with the other two descriptors, although the motif-based descriptor itself is not powerful.

Bottom Line: The triosephosphate isomerase (TIM)-barrel fold occurs frequently in the proteomes of different organisms, and the known TIM-barrel proteins have been found to play diverse functional roles.With the assistance of Support Vector Machine (SVM), the three descriptors were combined to obtain a new method with improved performance, which we call TIM-Finder.TIM-Finder can serve as a competitive tool for proteome-wide TIM-barrel protein identification.

View Article: PubMed Central - HTML - PubMed

Affiliation: State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China. sijingna@gmail.com

ABSTRACT

Background: The triosephosphate isomerase (TIM)-barrel fold occurs frequently in the proteomes of different organisms, and the known TIM-barrel proteins have been found to play diverse functional roles. To accelerate the exploration of the sequence-structure protein landscape in the TIM-barrel fold, a computational tool that allows sensitive detection of TIM-barrel proteins is required.

Results: To develop a new TIM-barrel protein identification method in this work, we consider three descriptors: a sequence-alignment-based descriptor using PSI-BLAST e-values and bit scores, a descriptor based on secondary structure element alignment (SSEA), and a descriptor based on the occurrence of PROSITE functional motifs. With the assistance of Support Vector Machine (SVM), the three descriptors were combined to obtain a new method with improved performance, which we call TIM-Finder. When tested on the whole proteome of Bacillus subtilis, TIM-Finder is able to detect 194 TIM-barrel proteins at a 99% confidence level, outperforming the PSI-BLAST search as well as one existing fold recognition method.

Conclusions: TIM-Finder can serve as a competitive tool for proteome-wide TIM-barrel protein identification. The TIM-Finder web server is freely accessible at http://202.112.170.199/TIM-Finder/.

Show MeSH