Limits...
Identification of antifreeze proteins and their functional residues by support vector machine and genetic algorithms based on n-peptide compositions.

Yu CS, Lu CH - PLoS ONE (2011)

Bottom Line: For the first time, multiple sets of n-peptide compositions from antifreeze protein (AFP) sequences of various cold-adapted fish and insects were analyzed using support vector machine and genetic algorithms.Our results reveal that it is feasible to identify the shared sequential features among the various structural types of AFPs.This approach should be useful for genomic and proteomic studies involving cold-adapted organisms.

View Article: PubMed Central - PubMed

Affiliation: Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan. yucs@fcu.edu.tw

ABSTRACT
For the first time, multiple sets of n-peptide compositions from antifreeze protein (AFP) sequences of various cold-adapted fish and insects were analyzed using support vector machine and genetic algorithms. The identification of AFPs is difficult because they exist as evolutionarily divergent types, and because their sequences and structures are present in limited numbers in currently available databases. Our results reveal that it is feasible to identify the shared sequential features among the various structural types of AFPs. Moreover, we were able to identify residues involved in ice binding without requiring knowledge of the three-dimensional structures of these AFPs. This approach should be useful for genomic and proteomic studies involving cold-adapted organisms.

Show MeSH

Related in: MedlinePlus

Examples of key residues mapped onto the surfaces of the seven representative AFPs used in the cross-validation tests.The structures were drawn using PyMOL [37]. Identification key residues are denoted in red (more votes) and yellow (fewer votes) for the following PDB structures: (A) the winter flounder α-helical AFP (PDB ID 1wfa) [35]; (B) the snow flea AFP (PDB ID 2pne) [38]; (C) the β-helical spruce budworm AFP (PDB ID 1eww) [13]; (D) the β-helical beetle Tenebrio molitor AFP (PDB ID 1ezg) [36].
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3105057&req=5

pone-0020445-g002: Examples of key residues mapped onto the surfaces of the seven representative AFPs used in the cross-validation tests.The structures were drawn using PyMOL [37]. Identification key residues are denoted in red (more votes) and yellow (fewer votes) for the following PDB structures: (A) the winter flounder α-helical AFP (PDB ID 1wfa) [35]; (B) the snow flea AFP (PDB ID 2pne) [38]; (C) the β-helical spruce budworm AFP (PDB ID 1eww) [13]; (D) the β-helical beetle Tenebrio molitor AFP (PDB ID 1ezg) [36].

Mentions: For the different coding-scheme SVM classifiers used in this study, we were able to reduce the number of feature attributes required by at least 50% after implementing the GA. Consequently, each remaining classifier was well suited to being able to identify the corresponding type of AFP (Table 4). To understand why the features were selected as classifiers, we assigned a number (vote) when the pattern of residues in a sequence matched a GA-selected feature attribute of a coding scheme. The sequence position was marked as an SVMGA key residue if it had received a majority of the jury votes from the 13 coding schemes that we used. For instance, the dipeptide LT was selected in the D0 scheme, and the interval dipeptide T(X2)T was selected in the D1 scheme–thus for the short peptide NTALT, the L at the forth position and the first T each received one vote, and the second T received two votes (Table 5). The representative AFPs are presented in Figure 2, and their SVMGA key residues are marked. Residues with >6 votes, with 4 or 5 votes, and with <3 votes are colored red, yellow, and gray, respectively. The average number of SVMGA key residues in AFP sequences and in non-AFP sequences was confirmed as being significantly different. Approximately 70% of the SVMGA-selected key residues are solvent exposed (data not shown), which is as expected because these residues are more likely to interact with ice.


Identification of antifreeze proteins and their functional residues by support vector machine and genetic algorithms based on n-peptide compositions.

Yu CS, Lu CH - PLoS ONE (2011)

Examples of key residues mapped onto the surfaces of the seven representative AFPs used in the cross-validation tests.The structures were drawn using PyMOL [37]. Identification key residues are denoted in red (more votes) and yellow (fewer votes) for the following PDB structures: (A) the winter flounder α-helical AFP (PDB ID 1wfa) [35]; (B) the snow flea AFP (PDB ID 2pne) [38]; (C) the β-helical spruce budworm AFP (PDB ID 1eww) [13]; (D) the β-helical beetle Tenebrio molitor AFP (PDB ID 1ezg) [36].
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3105057&req=5

pone-0020445-g002: Examples of key residues mapped onto the surfaces of the seven representative AFPs used in the cross-validation tests.The structures were drawn using PyMOL [37]. Identification key residues are denoted in red (more votes) and yellow (fewer votes) for the following PDB structures: (A) the winter flounder α-helical AFP (PDB ID 1wfa) [35]; (B) the snow flea AFP (PDB ID 2pne) [38]; (C) the β-helical spruce budworm AFP (PDB ID 1eww) [13]; (D) the β-helical beetle Tenebrio molitor AFP (PDB ID 1ezg) [36].
Mentions: For the different coding-scheme SVM classifiers used in this study, we were able to reduce the number of feature attributes required by at least 50% after implementing the GA. Consequently, each remaining classifier was well suited to being able to identify the corresponding type of AFP (Table 4). To understand why the features were selected as classifiers, we assigned a number (vote) when the pattern of residues in a sequence matched a GA-selected feature attribute of a coding scheme. The sequence position was marked as an SVMGA key residue if it had received a majority of the jury votes from the 13 coding schemes that we used. For instance, the dipeptide LT was selected in the D0 scheme, and the interval dipeptide T(X2)T was selected in the D1 scheme–thus for the short peptide NTALT, the L at the forth position and the first T each received one vote, and the second T received two votes (Table 5). The representative AFPs are presented in Figure 2, and their SVMGA key residues are marked. Residues with >6 votes, with 4 or 5 votes, and with <3 votes are colored red, yellow, and gray, respectively. The average number of SVMGA key residues in AFP sequences and in non-AFP sequences was confirmed as being significantly different. Approximately 70% of the SVMGA-selected key residues are solvent exposed (data not shown), which is as expected because these residues are more likely to interact with ice.

Bottom Line: For the first time, multiple sets of n-peptide compositions from antifreeze protein (AFP) sequences of various cold-adapted fish and insects were analyzed using support vector machine and genetic algorithms.Our results reveal that it is feasible to identify the shared sequential features among the various structural types of AFPs.This approach should be useful for genomic and proteomic studies involving cold-adapted organisms.

View Article: PubMed Central - PubMed

Affiliation: Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan. yucs@fcu.edu.tw

ABSTRACT
For the first time, multiple sets of n-peptide compositions from antifreeze protein (AFP) sequences of various cold-adapted fish and insects were analyzed using support vector machine and genetic algorithms. The identification of AFPs is difficult because they exist as evolutionarily divergent types, and because their sequences and structures are present in limited numbers in currently available databases. Our results reveal that it is feasible to identify the shared sequential features among the various structural types of AFPs. Moreover, we were able to identify residues involved in ice binding without requiring knowledge of the three-dimensional structures of these AFPs. This approach should be useful for genomic and proteomic studies involving cold-adapted organisms.

Show MeSH
Related in: MedlinePlus