Limits...
Identification of antifreeze proteins and their functional residues by support vector machine and genetic algorithms based on n-peptide compositions.

Yu CS, Lu CH - PLoS ONE (2011)

Bottom Line: For the first time, multiple sets of n-peptide compositions from antifreeze protein (AFP) sequences of various cold-adapted fish and insects were analyzed using support vector machine and genetic algorithms.Our results reveal that it is feasible to identify the shared sequential features among the various structural types of AFPs.This approach should be useful for genomic and proteomic studies involving cold-adapted organisms.

View Article: PubMed Central - PubMed

Affiliation: Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan. yucs@fcu.edu.tw

ABSTRACT
For the first time, multiple sets of n-peptide compositions from antifreeze protein (AFP) sequences of various cold-adapted fish and insects were analyzed using support vector machine and genetic algorithms. The identification of AFPs is difficult because they exist as evolutionarily divergent types, and because their sequences and structures are present in limited numbers in currently available databases. Our results reveal that it is feasible to identify the shared sequential features among the various structural types of AFPs. Moreover, we were able to identify residues involved in ice binding without requiring knowledge of the three-dimensional structures of these AFPs. This approach should be useful for genomic and proteomic studies involving cold-adapted organisms.

Show MeSH

Related in: MedlinePlus

Rate of identifying the 369 AFPs from the second independent set.Each bar correlates the identification accuracy with a range of maximum SI values, found from the y axis of Figure 1 in specific ranges of SI for the different species.
© Copyright Policy
Related In: Results  -  Collection


getmorefigures.php?uid=PMC3105057&req=5

pone-0020445-g004: Rate of identifying the 369 AFPs from the second independent set.Each bar correlates the identification accuracy with a range of maximum SI values, found from the y axis of Figure 1 in specific ranges of SI for the different species.

Mentions: Using the 369 AFPs in the second independent dataset (Figure 4), for which no structural information was available, the identification accuracy diminished as the evolutionary distance of a protein sequence increased from the model fish and insect sequences. For sequences with very low SI values (approximately 15∼20%), especially those from algae, bacteria, and plants, our approach gave an identification rate of approximately 20%. The identification rate of fish AFPs was around 70% accurate even when sequences with lower than 20% SI values. In fact, we believe that the features encoded in the fish and insect sequences may be used to identify AFPs from evolutionarily divergent organisms. As more sequence data for AFPs are accumulated, those data can be used to further characterize the mechanisms of cold adaptation. Finally, our approach can be used as an efficient way to obtain high throughput identification of protein function on a genome-wide scale. We have implemented our method as a web–based service, iAFP, available at http://140.134.24.89/~iafp/.


Identification of antifreeze proteins and their functional residues by support vector machine and genetic algorithms based on n-peptide compositions.

Yu CS, Lu CH - PLoS ONE (2011)

Rate of identifying the 369 AFPs from the second independent set.Each bar correlates the identification accuracy with a range of maximum SI values, found from the y axis of Figure 1 in specific ranges of SI for the different species.
© Copyright Policy
Related In: Results  -  Collection

Show All Figures
getmorefigures.php?uid=PMC3105057&req=5

pone-0020445-g004: Rate of identifying the 369 AFPs from the second independent set.Each bar correlates the identification accuracy with a range of maximum SI values, found from the y axis of Figure 1 in specific ranges of SI for the different species.
Mentions: Using the 369 AFPs in the second independent dataset (Figure 4), for which no structural information was available, the identification accuracy diminished as the evolutionary distance of a protein sequence increased from the model fish and insect sequences. For sequences with very low SI values (approximately 15∼20%), especially those from algae, bacteria, and plants, our approach gave an identification rate of approximately 20%. The identification rate of fish AFPs was around 70% accurate even when sequences with lower than 20% SI values. In fact, we believe that the features encoded in the fish and insect sequences may be used to identify AFPs from evolutionarily divergent organisms. As more sequence data for AFPs are accumulated, those data can be used to further characterize the mechanisms of cold adaptation. Finally, our approach can be used as an efficient way to obtain high throughput identification of protein function on a genome-wide scale. We have implemented our method as a web–based service, iAFP, available at http://140.134.24.89/~iafp/.

Bottom Line: For the first time, multiple sets of n-peptide compositions from antifreeze protein (AFP) sequences of various cold-adapted fish and insects were analyzed using support vector machine and genetic algorithms.Our results reveal that it is feasible to identify the shared sequential features among the various structural types of AFPs.This approach should be useful for genomic and proteomic studies involving cold-adapted organisms.

View Article: PubMed Central - PubMed

Affiliation: Department of Information Engineering and Computer Science, Feng Chia University, Taichung, Taiwan. yucs@fcu.edu.tw

ABSTRACT
For the first time, multiple sets of n-peptide compositions from antifreeze protein (AFP) sequences of various cold-adapted fish and insects were analyzed using support vector machine and genetic algorithms. The identification of AFPs is difficult because they exist as evolutionarily divergent types, and because their sequences and structures are present in limited numbers in currently available databases. Our results reveal that it is feasible to identify the shared sequential features among the various structural types of AFPs. Moreover, we were able to identify residues involved in ice binding without requiring knowledge of the three-dimensional structures of these AFPs. This approach should be useful for genomic and proteomic studies involving cold-adapted organisms.

Show MeSH
Related in: MedlinePlus