Limits...
Identification of antifreeze proteins and their functional residues by support vector machine and genetic algorithms based on n-peptide compositions.

Yu CS, Lu CH - PLoS ONE (2011)

Bottom Line: For the first time, multiple sets of n-peptide compositions from antifreeze protein (AFP) sequences of various cold-adapted fish and insects were analyzed using support vector machine and genetic algorithms.Our results reveal that it is feasible to identify the shared sequential features among the various structural types of AFPs.This approach should be useful for genomic and proteomic studies involving cold-adapted organisms.

View Article: PubMed Central - PubMed

Affiliation: Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan. yucs@fcu.edu.tw

ABSTRACT
For the first time, multiple sets of n-peptide compositions from antifreeze protein (AFP) sequences of various cold-adapted fish and insects were analyzed using support vector machine and genetic algorithms. The identification of AFPs is difficult because they exist as evolutionarily divergent types, and because their sequences and structures are present in limited numbers in currently available databases. Our results reveal that it is feasible to identify the shared sequential features among the various structural types of AFPs. Moreover, we were able to identify residues involved in ice binding without requiring knowledge of the three-dimensional structures of these AFPs. This approach should be useful for genomic and proteomic studies involving cold-adapted organisms.

Show MeSH
The surface of the eelpout type III AFP (PDB ID 1msi).(A) Key residues selected by the SVMGA are labeled in black words. Residues Q9, V20, M21 and Q44, which were identified as key residues in a mutagenesis study but not by the SVMGA, are shown in cyan. (B) A view of the ice–binding interface; all residues that are part of the interface as reported are labeled. The residues identified by SVMGA are shown in red and yellow. Residues known to be important in ice binding, but not identified by the SVMGA are shown in cyan. Residues not identified by the SVMGA are shown in gray. Residues not determined by SVMGA are shown in gray.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3105057&req=5

pone-0020445-g003: The surface of the eelpout type III AFP (PDB ID 1msi).(A) Key residues selected by the SVMGA are labeled in black words. Residues Q9, V20, M21 and Q44, which were identified as key residues in a mutagenesis study but not by the SVMGA, are shown in cyan. (B) A view of the ice–binding interface; all residues that are part of the interface as reported are labeled. The residues identified by SVMGA are shown in red and yellow. Residues known to be important in ice binding, but not identified by the SVMGA are shown in cyan. Residues not identified by the SVMGA are shown in gray. Residues not determined by SVMGA are shown in gray.

Mentions: To identify AFPs here, we used an integrated machine-learning method, SVMGA, that uses multiple n-peptide composition features. Our results show that sequentially divergent AFPs can be identified according to their shared sequence characteristics, because any test sequence or its homologs are not used in the training set. A set of n-peptide, composition-based, SVM predictors were combined to accurately recognize AFPs, and more importantly, to identify the key functional residues neighboring the ice-binding surfaces. Jia and Davies [7] have characterized defining residue repeats in AFP sequences, e.g., alanine-rich α-helix of type I AFPs (Figure 2A), and TXT (Figure 2C) or TCT (Figure 2D) in insect AFPs. The feature attributes, selected by our SVMGA approach also included these defining residue repeats. Some of the key SVMGA residues in representative structures of AFPs form relatively flat planes as shown by the red and yellow clustered regions in Figure 2 and 3. Additionally, our SVMGA approach identified some residues that lie at the interface between two polypeptide chains of the crystallized form used for structure determination, e.g., T13 and T24 in chain A of winter flounder antifreeze protein (PDB ID 1wfa) [35] although the active protein is a monomer (Figure 2A). Other key residues were identified by SVMGA, e.g., A8, L12, N16, and T24—all of which lie on the same side of the flat ice–binding interface which consist with the T/N/L ice–binding motif in previous work[35]. Another similar example is the β-sheet plane of chain A in 1ezg (Figure 2D). Although the TCX tri-peptide parallel strands [36] align perfectly in the dimer form, this flatter, ice–binding surface is found in the monomer and is denoted as seen by red and yellow coloration at the functional interface.


Identification of antifreeze proteins and their functional residues by support vector machine and genetic algorithms based on n-peptide compositions.

Yu CS, Lu CH - PLoS ONE (2011)

The surface of the eelpout type III AFP (PDB ID 1msi).(A) Key residues selected by the SVMGA are labeled in black words. Residues Q9, V20, M21 and Q44, which were identified as key residues in a mutagenesis study but not by the SVMGA, are shown in cyan. (B) A view of the ice–binding interface; all residues that are part of the interface as reported are labeled. The residues identified by SVMGA are shown in red and yellow. Residues known to be important in ice binding, but not identified by the SVMGA are shown in cyan. Residues not identified by the SVMGA are shown in gray. Residues not determined by SVMGA are shown in gray.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3105057&req=5

pone-0020445-g003: The surface of the eelpout type III AFP (PDB ID 1msi).(A) Key residues selected by the SVMGA are labeled in black words. Residues Q9, V20, M21 and Q44, which were identified as key residues in a mutagenesis study but not by the SVMGA, are shown in cyan. (B) A view of the ice–binding interface; all residues that are part of the interface as reported are labeled. The residues identified by SVMGA are shown in red and yellow. Residues known to be important in ice binding, but not identified by the SVMGA are shown in cyan. Residues not identified by the SVMGA are shown in gray. Residues not determined by SVMGA are shown in gray.
Mentions: To identify AFPs here, we used an integrated machine-learning method, SVMGA, that uses multiple n-peptide composition features. Our results show that sequentially divergent AFPs can be identified according to their shared sequence characteristics, because any test sequence or its homologs are not used in the training set. A set of n-peptide, composition-based, SVM predictors were combined to accurately recognize AFPs, and more importantly, to identify the key functional residues neighboring the ice-binding surfaces. Jia and Davies [7] have characterized defining residue repeats in AFP sequences, e.g., alanine-rich α-helix of type I AFPs (Figure 2A), and TXT (Figure 2C) or TCT (Figure 2D) in insect AFPs. The feature attributes, selected by our SVMGA approach also included these defining residue repeats. Some of the key SVMGA residues in representative structures of AFPs form relatively flat planes as shown by the red and yellow clustered regions in Figure 2 and 3. Additionally, our SVMGA approach identified some residues that lie at the interface between two polypeptide chains of the crystallized form used for structure determination, e.g., T13 and T24 in chain A of winter flounder antifreeze protein (PDB ID 1wfa) [35] although the active protein is a monomer (Figure 2A). Other key residues were identified by SVMGA, e.g., A8, L12, N16, and T24—all of which lie on the same side of the flat ice–binding interface which consist with the T/N/L ice–binding motif in previous work[35]. Another similar example is the β-sheet plane of chain A in 1ezg (Figure 2D). Although the TCX tri-peptide parallel strands [36] align perfectly in the dimer form, this flatter, ice–binding surface is found in the monomer and is denoted as seen by red and yellow coloration at the functional interface.

Bottom Line: For the first time, multiple sets of n-peptide compositions from antifreeze protein (AFP) sequences of various cold-adapted fish and insects were analyzed using support vector machine and genetic algorithms.Our results reveal that it is feasible to identify the shared sequential features among the various structural types of AFPs.This approach should be useful for genomic and proteomic studies involving cold-adapted organisms.

View Article: PubMed Central - PubMed

Affiliation: Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan. yucs@fcu.edu.tw

ABSTRACT
For the first time, multiple sets of n-peptide compositions from antifreeze protein (AFP) sequences of various cold-adapted fish and insects were analyzed using support vector machine and genetic algorithms. The identification of AFPs is difficult because they exist as evolutionarily divergent types, and because their sequences and structures are present in limited numbers in currently available databases. Our results reveal that it is feasible to identify the shared sequential features among the various structural types of AFPs. Moreover, we were able to identify residues involved in ice binding without requiring knowledge of the three-dimensional structures of these AFPs. This approach should be useful for genomic and proteomic studies involving cold-adapted organisms.

Show MeSH