Limits...
ProLoc-GO: utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization.

Huang WL, Tung CW, Ho SW, Hwang SF, Ho SY - BMC Bioinformatics (2008)

Bottom Line: ProLoc-GO using input sequences yields test accuracies of 88.1% and 83.3% for SCL12 and SCL16, respectively, which are significantly better than the SVM-based methods, which achieve < 35% test accuracies using amino acid composition (AAC) with acid pairs and AAC with dipedtide composition.The growth of Gene Ontology in size and popularity has increased the effectiveness of GO-based features.The prediction system using ProLoc-GO with input sequences of query proteins for protein subcellular localization has been implemented (see Availability).

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Bioinformatics, National Chiao Tung University, Hsinchu, Taiwan. wenlinhuang2001@yahoo.com.tw

ABSTRACT

Background: Gene Ontology (GO) annotation, which describes the function of genes and gene products across species, has recently been used to predict protein subcellular and subnuclear localization. Existing GO-based prediction methods for protein subcellular localization use the known accession numbers of query proteins to obtain their annotated GO terms. An accurate prediction method for predicting subcellular localization of novel proteins without known accession numbers, using only the input sequence, is worth developing.

Results: This study proposes an efficient sequence-based method (named ProLoc-GO) by mining informative GO terms for predicting protein subcellular localization. For each protein, BLAST is used to obtain a homology with a known accession number to the protein for retrieving the GO annotation. A large number n of all annotated GO terms that have ever appeared are then obtained from a large set of training proteins. A novel genetic algorithm based method (named GOmining) combined with a classifier of support vector machine (SVM) is proposed to simultaneously identify a small number m out of the n GO terms as input features to SVM, where m

Conclusion: The growth of Gene Ontology in size and popularity has increased the effectiveness of GO-based features. GOmining can serve as a tool for selecting informative GO terms in solving sequence-based prediction problems. The prediction system using ProLoc-GO with input sequences of query proteins for protein subcellular localization has been implemented (see Availability).

Show MeSH
Prediction flowchart of ProLoc-GO using both classifiers SVM-IGO and SVM-GO.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
getmorefigures.php?uid=PMC2262056&req=5

Figure 6: Prediction flowchart of ProLoc-GO using both classifiers SVM-IGO and SVM-GO.

Mentions: As shown in Fig. 6, each query protein is first BLASTed with h = 1 and e = 10-9 against the Swiss-Prot database to obtain a homology with a known accession number. If no such homology exists, then adjust the threshold value e of BLAST until the desired homology is obtained, where h = 1 and e ∈ {10-9, 10-8,..., 10-1}. The accession number of the homology of each protein sequence in SCL12 and SCL16 was obtained by using BLAST with h = 1 and e = 10-9. This accession number is used as input to the GOA database for retrieving the corresponding k (>1) GO terms: GO:1, GO:2,... GO:k. If none of the k GO terms belongs to the set of m informative GO terms, then the sequence is represented using an n-dimensional binary vector and is predicted by the SVM-GO classifier. Otherwise, the sequence is represented as an m-dimensional binary vector and is predicted by the SVM-IGO classifier. Notably, the SVM-GO classifier predicts only a very small percentage of input sequences. ProLoc-GO is derived from the two major classifiers SVM-GO and SVM-IGO for subcellular localization prediction.


ProLoc-GO: utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization.

Huang WL, Tung CW, Ho SW, Hwang SF, Ho SY - BMC Bioinformatics (2008)

Prediction flowchart of ProLoc-GO using both classifiers SVM-IGO and SVM-GO.
© Copyright Policy - open-access
Related In: Results  -  Collection

License
Show All Figures
getmorefigures.php?uid=PMC2262056&req=5

Figure 6: Prediction flowchart of ProLoc-GO using both classifiers SVM-IGO and SVM-GO.
Mentions: As shown in Fig. 6, each query protein is first BLASTed with h = 1 and e = 10-9 against the Swiss-Prot database to obtain a homology with a known accession number. If no such homology exists, then adjust the threshold value e of BLAST until the desired homology is obtained, where h = 1 and e ∈ {10-9, 10-8,..., 10-1}. The accession number of the homology of each protein sequence in SCL12 and SCL16 was obtained by using BLAST with h = 1 and e = 10-9. This accession number is used as input to the GOA database for retrieving the corresponding k (>1) GO terms: GO:1, GO:2,... GO:k. If none of the k GO terms belongs to the set of m informative GO terms, then the sequence is represented using an n-dimensional binary vector and is predicted by the SVM-GO classifier. Otherwise, the sequence is represented as an m-dimensional binary vector and is predicted by the SVM-IGO classifier. Notably, the SVM-GO classifier predicts only a very small percentage of input sequences. ProLoc-GO is derived from the two major classifiers SVM-GO and SVM-IGO for subcellular localization prediction.

Bottom Line: ProLoc-GO using input sequences yields test accuracies of 88.1% and 83.3% for SCL12 and SCL16, respectively, which are significantly better than the SVM-based methods, which achieve < 35% test accuracies using amino acid composition (AAC) with acid pairs and AAC with dipedtide composition.The growth of Gene Ontology in size and popularity has increased the effectiveness of GO-based features.The prediction system using ProLoc-GO with input sequences of query proteins for protein subcellular localization has been implemented (see Availability).

View Article: PubMed Central - HTML - PubMed

Affiliation: Institute of Bioinformatics, National Chiao Tung University, Hsinchu, Taiwan. wenlinhuang2001@yahoo.com.tw

ABSTRACT

Background: Gene Ontology (GO) annotation, which describes the function of genes and gene products across species, has recently been used to predict protein subcellular and subnuclear localization. Existing GO-based prediction methods for protein subcellular localization use the known accession numbers of query proteins to obtain their annotated GO terms. An accurate prediction method for predicting subcellular localization of novel proteins without known accession numbers, using only the input sequence, is worth developing.

Results: This study proposes an efficient sequence-based method (named ProLoc-GO) by mining informative GO terms for predicting protein subcellular localization. For each protein, BLAST is used to obtain a homology with a known accession number to the protein for retrieving the GO annotation. A large number n of all annotated GO terms that have ever appeared are then obtained from a large set of training proteins. A novel genetic algorithm based method (named GOmining) combined with a classifier of support vector machine (SVM) is proposed to simultaneously identify a small number m out of the n GO terms as input features to SVM, where m

Conclusion: The growth of Gene Ontology in size and popularity has increased the effectiveness of GO-based features. GOmining can serve as a tool for selecting informative GO terms in solving sequence-based prediction problems. The prediction system using ProLoc-GO with input sequences of query proteins for protein subcellular localization has been implemented (see Availability).

Show MeSH